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Abstract 

We characterize fundamental limits for distributed lossless source coding (the Slepian-Wolf problem), the 
multiple-access channel and the asymmetric broadcast channel in the finite blocklength setting. For the Slepian- 
CN '. Wolf problem, we introduce a fundamental quantity known as the entropy dispersion matrix, which is analogous 

to the scalar dispersion quantities that have gained interest in the recent literature. We show that if this matrix is 
positive-definite, the optimal rate region under the constraint of a fixed blocklength and non-zero error probability 
has a curved boundary compared to being polyhedral for the asymptotic Slepian-Wolf scenario. In addition, the 
entropy dispersion matrix governs the rate of convergence of the non-asymptotic region to the asymptotic one. We 

■ develop a general universal achievability procedure for finite blocklength analyses of other network information 
theory problems such as the multiple-access channel and broadcast channel. We provide inner bounds to these 
problems using a key result known as the vector rate redundancy theorem which is proved using a multidimensional 
version of the Berry-Esseen theorem. We show that a so-called information dispersion matrix characterizes these 
inner bounds. Numerical examples show how the non-asymptotic Slepian-Wolf region and multiple-access inner 
bound compare to their asymptotic counterparts. We also demonstrate numerically that the required blocklengths 

Q ■ predicted by dispersion analysis are generally smaller than that predicted by error exponent analysis. 
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I. Introduction 

Network information theory fV\ aims to find the fundamental limits of communication in networks with multiple 
psj senders and receivers. The primary goal is to characterize the optimal rate region or capacity region — that is, 
^ [ the set of rate tuples for which there exists codes with reliable transmission. Such rate tuples are known as being 
>- ■ achievable. While the characterization of capacity regions is a difficult problem in general, there have been positive 
^ \ results for several special classes of networks such as the multiple-access channel |2|, [3| and the asymmetric 141 
\ or degraded broadcast channels O, ||6l. A prominent example in multi-terminal lossless source coding in which 
the optimal rate region is known is the so-called Slepian-Wolf problem [7 ] which involves separately encoding two 
(or more) correlated sources and subsequently estimating them from their rate-limited representations. 

The capacity region for a specific source or channel model is an asymptotic notion. One is allowed to design codes 
that operate over arbitrarily long blocks (or channel uses) in order to drive either the maximal or average probabilities 
of error to zero. To illustrate this point, let us recap Shannon's point-to-point channel coding theorem [SI. He showed 
that up to nC bits can be reliably transmitted over n uses of a discrete memoryless channel (DMC) as n becomes 
large. Here, C = maxp^ I{Px, W) is termed the capacity of the channel W. However, this fundamental result for 
reliable communication over a noisy channel can be optimistic in practice as there may be system constraints on 
the delay in decoding. One can thus ask a slightly different and more challenging question: What is the maximal 
rate of transmission R* (n, e) as a function of a fixed blocklength n and target average error probability e? This 
problem has been studied rather extensively recently. Perhaps the most prominent work is that by Polyanskiy, Poor 
and Verdii |j9| who showed using Gaussian approximations (and the Berry-Esseen theorem |[TOl Ch. XVI.5]) that 

i?*(n,e)«C7-./^Q-i(e). (1) 
V n 
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The constant V coincides with an operational quantity known as the channel dispersion. It can be shown that the 
channel dispersion is the variance of the log-likelihood ratio of the channel W and the capacity-achieving output 
distribution py assuming uniqueness of the capacity-achieving input distribution px* '■= argmaXp/(j), VF). The 
term ^yV/nQ~^{e) is the rate penalty in the finite blocklength setting. In another prominent work, Hayashi ifTTl 
studied the so-called second-order channel coding rates from an information spectrum perspective. Both ifTTI and |[T2l 
noted that ([T]) holds verbatim for the additive white Gaussian noise (AWGN) channel |[T3l . 

In this paper, we ask similar non-asymptotic, dispersion-centered questions for three multi-user problems: dis- 
tributed lossless source coding, also known as the Slepian-Wolf (SW) problem, the multiple-access channel (MAC) 
and the asymmetric broadcast channel (ABC). We show that the network analogue of the scalar dispersion quantity V 
is a positive-semidefinite matrix V that generally depends on the channel, input distributions or sources. Our results 
are of practical importance due to the ubiquity of communication networks where numerous users simultaneously 
share a data compression system or utilize a common channel. There is also an pressing need to understand the 
finite blocklength behaviour of such multi-terminal systems since there may be hard constraints on the permissible 
number of channel uses, i.e., the delay in decoding. For example, given a tolerable average error probability of e and 
a blocklength n, what is the set of achievable rate pairs {Ri, R2) for two non-cooperating parties to communicate 
to a common destination? Our results in Section JII] for the MAC provide a partial answer (achievability/inner 
bound) to this question. As the method of types |[l4l is used as a key proof technique in this paper, we focus on 
discrete memoryless systems but, at various points in the paper, we will also comment on how our techniques can 
be generalized to systems with arbitrary alphabets. 



A. Summary of Main Results 

There are three main results in this paper: 

• For the Slepian-Wolf (SW) problem, we define the (n, e)-optimal rate region 



n, e) as the set of rate 



pairs (i?i,i?2) for which there exists a code that guarantees that the error probability in reconstructing the 
sources does not exceed e. We characterize ^%Z,-^{n,e) up to an 0(^2£2i) factor. Furthermore, the implied 
constants in the terms depend only on the cardinality of the alphabets. Roughly speaking, our SW 

result (Theorem [T]) says that ^g^(n,e) is the set of rate pairs (i?i,i?2) satisfying 
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where the set =5^(V, e) C M'^, defined precisely in Q and diagrammed in Fig. [T] is a multidimensional analogue 
of the cumulative distribution function for a zero-mean multivariate Gaussian with covariance matrix V. See 
Fig. |2]for a plot of =^sw(n, e) ignoring the 0(i^^) terms. To prove the direct part of this result, we introduce 
a coding scheme based on random binning and empirical entropy thresholding and we analyze the error 
probability. We argue, by providing a converse proof, that such a coding scheme is indeed optimal up to the 
OC-^) term. In the course of doing so, we introduce a fundamental quantity called the entropy dispersion 
matrix of pxi,X2- This is the matrix V that appears in We show that if this matrix is non-singular, 
the boundary of ^^T^{n,e) is, unlike that of the asymptotic SW region, a smooth curve. We demonstrate 
numerically how our region compares to the SW region and to the problem of finite blocklength source coding 
with side information both at decoder and at the encoders. Importantly, we also derive the effective dispersion 
as a pair of rates approaches a point on the boundary of the asymptotic rate region along a line with a specified 
gradient. Finally, we conduct some numerical experiments to compare the blocklengths predicted by our bounds 
in ^ to that of Gallager-style error exponent analysis for the SW problem |15|, |16|. 

The analysis of the coding scheme for the SW problem is based on a multidimensional Berry-Esseen theo- 
rem ifTTl . a Gaussian approximation for the distribution of the sum of independent random vectors. We use 
this powerful theorem to derive a general result known as the vector rate redundancy theorem, which, as we 
will see, is applicable to several other network information theory problems. 

For the discrete memoryless MAC, we leverage on unified and conceptually simple achievability techniques 
to derive an inner bound to the (n, e)-capacity region '^mac("'' show that this inner bound for a fixed 

time-sharing distribution and two fixed input distributions is, in general, not a pentagon, unlike the traditional 
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capacity region 121, Q- More precisely, we show that for these fixed input distributions, if the information 
dispersion matrix (the mutual information analogue of the entropy dispersion matrix) is full rank, then the 
region has a curved boundary. Indeed, the inner bound for the MAC is dual to the non-asymptotic SW region 
in dD. We discuss the obstacles to proving a converse that matches the inner bound. We demonstrate using 
numerical examples how the non-asymptotic MAC region compares with the asymptotic MAC region. 
• To demonstrate the full utility of our achievability proof technique, we apply it to derive an inner bound for the 
(n, e)-capacity region "^XbcC"^' ^) °f '■'^^ discrete memoryless asymmetric broadcast channel (also known as the 
broadcast channel with degraded message sets). We use the superposition coding technique [5| and a version 
of maximum mutual information (MMI) decoding lITSl but we apply a more delicate analysis to bound the 
error probabilities of the constituent error events. Similar to the MAC, we show that an appropriately defined 
information dispersion matrix governs the rate at which our inner bound approaches the capacity region first 
proved by Korner and Maiton [4J. Unfortunately as with the MAC, we currently do not have a converse. 

B. Related Work 

Dispersion or finite blocklength analysis for channel coding was studied extensively in the work by Polyanskiy 
et al. [9 1, who introduced new and tight channel coding rate bounds and use these bounds to strengthen the results 
in the seminal work by Strassen ||T9l . Baron et al. [|20|] focused on the binary symmetric channel and compared 
the results to that by Shannon for the AWGN channel |[T3l . Such finite blocklength analysis has promptly been 
extended to lossy source coding fTT\, (22\, joint source-channel coding f23|, channel coding with states 1241 . 1251 . 
and infinite constellations |26| among others. The study of the effect of finite blocklengths on information theory 
problems is also connected to second-order coding rates ifTTl . ||27l , [28] and moderate deviations analysis [29I- |[3T1 . 
It was noted in ll32l that the relation in ([T]) may be derived in an alternative manner using saddlepoint (or Laplace) 
approximations of the random-coding union (RCU) and dependence testing (DT) bounds in [9|. Dispersion analysis 
is complementary to that of traditional error exponent analysis 1 15 1, 1 16|, |33|, [34|. In the latter, we fix a rate tuple 
in the capacity region and ask how rapidly the error probability decays as an exponential function of the blocklength. 
In the former, the error probability and the blocklength are fixed. The spotlight is now shone on achievable rates 
at the specified blocklength and error probability. We compare our dispersion analysis to traditional error exponent 
analysis for the SW problem. 

The problem of SW coding in the finite blocklength, fixed error probability setting was discussed by Baron et 
al. ||35l . Sarvotham et al. 1361 and He et al. |37|. However, in these works, the authors considered a single source 
Xi to be compressed and (non-coded) side information X2 available only at the decoder. Thus, X2 is neither coded 
nor estimated. They showed that a scalar dispersion quantity governs the second-order coding rate. He et al. [37J 
also analyzed a variable-length variant of the SW problem and showed that the dispersion is, in general, smaller 
than in the fixed-length setting. Due to the duality between SW coding and channel coding, this variable-length 
dispersion shown to be similar to that for channel coding |9|. Sarvotham et al. [38] considered the SW problem 
with two sources to be compressed but limited their setting to the case the sources are binary and symmetric. They 
demonstrated a result analogous to [35|. The three constraints on the individual rates Ri, R2 and the sum rate 
Ri + R2 are decoupled when the sources are binary and symmetric. Similar conclusions were made by Chang 
and Sahai [39] from an error exponent perspective. Our work generalizes their setting in that we consider all finite 
alphabet sources with multiple encoders. We discuss further connections in Sections III-B2I and IVI-A4I 

To the best of the authors' knowledge, the SW problem is the only network information theory problem in which 
second-order coding rates and/or finite blocklength behaviours have been studied and published. Through personal 
communication with Prof. Pierre Moulin (UIUC) [40], the authors also came to know that Prof. Moulin has been 
working concuiTently on finite blocklength analysis for the MAC. 

C. Paper Organization 

This paper is organized as follows: In the following subsection, we introduce our notation. In Sectionjlll we present 
our finite blocklength results for the problem of distributed lossless source coding (the SW problem). Following that 
in Sections jlll] and jlVl we present our inner bounds for the MAC and ABC respectively. The organization within 
each of Sections Ull - irvl is common: We first remind the reader of existing asymptotic results. Then we provide 
precise definitions for the problem at hand. We then state the finite blocklength theorem and finally we discuss the 
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implications of the theorem. In Section |Vl we discuss the effect of approaching a specific point on the asymptotic 
capacity region from a specified angle. We also compute the effective dispersion. In Section |Vll we present several 
numerical examples to illustrate our finite blocklength regions for the SW problem and the MAC. We conclude our 
discussion and suggest avenues for further research in Section IVlIl Most of the proofs are presented in Section IVIIII 
where start by presenting a general result known as the vector rate redundancy theorem. We subsequently apply it 
in the achievability proofs for the SW problem, the MAC and the ABC. Proofs of auxiliary results are relegated 
to the appendices. 

D. Notation 

Throughout the paper, we adopt the following set of notation: Random variables and the values they take on 
will be denoted by upper case (e.g., X) and lower case (e.g., x) respectively. Random vectors will be denoted by 
upper case bold font or with a superscript indicating its length (e.g., X or X'"- = {Xi, . . . , Their realizations 
will be denoted by lower case bold font or with a superscript (e.g., x or = (xi, . . . Matrices will also 

be denoted by upper case bold font (e.g., M); this should hopefully cause no confusion with random vectors. 
The notation denotes the transpose of M. The notations M ^ and M ^ mean that M is (symmetric) 
positive-definite and positive-semidefinite respectively. In addition, Amin(M) and AmaxlM) denote, respectively, 
the minimum and maximum eigenvalues of M. The (i, j) element of M is denoted as [M]j j. For a vector v G W^, 
W^Wq — (Sf=i btl'')^^'' designates the Iq norm for q G [l,oo]. The notation 1 denotes the vector of all ones; 
the length will be clear from the context. For two vectors u, v G W^, the notation u < v means ut < vt for all 
t = 1, . . . , (i. The notation u > v is defined similarly. Sets and events will be denoted by calligraphic font (e.g., 
X). The complement of 8 is denoted as S'^. Subsets of Euclidean space will be denoted by script font (e.g., ^). 

Types (empirical distributions) will be denoted by upper case (e.g., P) and distributions by lower case (e.g., p). 
The set of distributions supported on a finite set X and the set of n-types supported on X will be denoted by 
£P{X) and J^n{X) respectively. For a sequence G Af", the type is denoted as Px^. The set of all sequences 
whose type is some P is denoted as 7p = T^, the type class. The superscript n is suppressed throughout. For two 
sequences x" G X^,y" G 3^", the conditional type of given x" is the stochastic matrix V : X ^ y satisfying 
Pxr^{a)V{b\a) = Px'^^y^{a,b) for all {a,b) ^ X xy. The set of y" with conditional type V given x" is denoted 
by 7y(x") = Ty{x'^), the F-shell of x"^. The family of stochastic matrices V : X ^ y for which the F-shell 
Tv{x^) of a sequence x" of type P G j3^n{X) is not empty is denoted as Yniy', P) llT4l Sec. 2.5]. In other words, 

V G Yniy; P) if and only if nP{a)V{b\a) is an integer for all (a, b) £X xy. 

Entropy and conditional entropy are denoted as H{X) = H{px) and H{Y\X) = H{py\x\px) respectively. 
Mutual information is denoted as I{X] Y) = I{px,Py\x)- We often times make the dependence on the distribution 
explicit. Let x",y" be a pair of sequences for which the has conditional type V given x^ and let X and 

Y be dummy random variables with joint distribution Px",y"- Then, the notations H{x^) = = H{X) 
and H{y"-\x^) = H{V\Px^^) = H(Y\X) denote, respectively, the empirical marginal and conditional entropies 
respectively. Note that empirical information quantities will generally be denoted with hats. So for example, the 
empirical mutual information of the random variables X, Y above will be denoted interchangeably as I{x'"' Ay^) = 
I{Px^,V) = I{X;Y). Empirical conditional mutual information is defined similarly. 

The multivariate Gaussian probability density function with mean m and covariance A is denoted as J\f{u; m, A) 
or more simply as M{m,A). For a standard univariate Gaussian A/'(u;0, 1), the cumulative distribution function 
and Q-function are defined as ^{z) := f^^M{u;0,l) du and Q{z) := 1 — ^{z) respectively. The functional 
inverse of the Q-function is denoted as Q^^(e). The Bernoulli random variable X ~ Bern(g) if P{X = 1) = q 
and P{X = 0) = I — q. Logarithms are to the base 2 (except when specifically indicated in Section IVll). We also 
use the discrete interval notation [2"^] := {!,..., [2"^]}. Asymptotic notation such as o( • ), 0( • ) and &{■) is 
used throughout. See BTl Sec. 1.3] for definitions. 

II. Dispersion of Distributed Lossless Source Coding 

Distributed lossless source coding consists in separately encoding two (or more) correlated sources (Xf ) ~ 
Y[k=iPXi,X2{^ik,X2k) into a pair of rate-limited messages (Mi,M2) G [2"^^] x [2"^^]. Subsequently, given these 
compressed versions of the sources, a decoder seeks to reconstruct (XfjXg). One of the most remarkable results 
in information theory, proved by Slepian and Wolf in 1973 Q, states that the set of achievable rate pairs i?2) 
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is asymptotically equal to that when each of the encoders is also given knowledge of the other source, i.e., encoder 
1 knows X2 and vice versa. The optimal rate region given by the polyhedron 

Ri>H{Xi\X2) 
R2 > H{X2\Xi) 

Ri + R2>HiXi,X2). (3) 

As with most other statements in information theory llT4l . this result is asymptotic in nature. In this section, we 
analyze the finite blocklength limits of distributed lossless source coding, which is also known as the SW problem. 

We will focus on the two-sender case. Generalizations will turn out to be straightforward. A two-sender SW 
code is characterized by four parameters; the blocklength n, the rates of the first and second sources {Ri, R2) and 
the probability of error defined as 

Pi") := P((XiM2") / (Xr, X2")), (4) 

where and X2 are the reconstructed versions of X^ and X2 respectively. Traditionally, we require the error 
probability Pi") — )• as the blocklength n — )• 00. In this section (as with the rest of this paper), we fix the 

in) 

blocklength n and require the code to be such that Pq < e. We then ask what the set of achievable pairs 
of rates as a function of (n, e) is. A more challenging task would be to consider constituent error probabilities 
P(Xf / ), P(XJ / XJ) and Pi") and place three different upper bounds ei,e2 and £3 on these probabilities. 
We choose to consider the single compound error probability in ^ for simplicity. Our main result in this section 
is a characterization of the (n, e)-optimal rate region up to a small 0(^^^) factor. The implied constants in the 
0{ ■ ) -notation are also specified. The main technical tool that we use in our proofs is a multidimensional version 
of the Berry -Esseen theorem developed by Bentkus lITTl . This is stated as Theorem [6] in Section IVIII I We start with 
definitions followed by the statement of the theorem. We then discuss the implications of the result. The proof of 
the main theorem is provided in Section IVIII-B I 



A. Definitions 

Let {Xi, X2,pxi,X2{^i:^2)) be a discrete memoryless multiple source (DMMS). This means that (XfjXJ^) ~ 
]Xk=iPXi,X2{xik, X2k), i-e-> the source is independent and identically distributed (i.i.d.). We remind the reader that 
the alphabets Xi,X2 are finite. We also assume throughout that pxi,X2 (^^ii 2:2) > for every {xi,X2) G A'l x X2. 

Definition 1. An (n, 2"-^\ 2"-^% e)-SW code consists of two encoders : -)■ Mj = [2"^^],j = 1,2, 
and a decoder ipn ■ Mi x M2 — ^ -^i x ^2 ^^ch that the the error probability in @ with (X",X2) := 
¥'n(/i,n(Xf ), /2,n(X2 )) does not exceed e. The compression rates are defined in the usual way as 

R, := i^^. (5) 

n 

Definition 2. A rate pair (Pi, P2) is {n, e)-achievable if there exists an (n, 2"^i , 2"^^ , e)-5W code for the DMMS 
PXijXal^^i; ^2)- The (n, e)-optimal rate region ^g^(n, e) C M? is the set of all (n, e) -achievable rate pairs. 

Traditionally, optimal rate regions are defined with an additional closure operation ||T1. However, as our analysis 
is on the finite blocklength setting, we do not take the closure in Definition |2l 

For a positive-semidefinite symmetric matrix V G W^^'^, let the random vector Z ~ M{0, V). Note that J\f{0, V) 
is a degenerate Gaussian if V is singular. If rank(V) = r < d, all the probability mass of p(u) = J\f{u; 0, V) lies 
in a subspace of dimension r in R"^. Define the set 

^(V,e) := {z G : P(Z < z) > 1 -e}. (6) 

Note that ^(V, e) C is well-defined even if V is singular. Furthermore, ^(V, e') C o5^(V, e) if e' < e. This set 
is analogous to the (inverse) cumulative distribution function of a zero-mean Gaussian with covariance matrix V. 
Indeed, the probabiUty in © can be written out as P(Z < z) = /^J^ /f'^ /^'^ J\f{u; 0, V) du. If e < 5, ^(V, e) 
is a convex, unbounded set in the positive orthant in R^. The boundary of o5^(V, e) is smooth if V is positive- 
definite. We shall see that this set scaled by namely --^o5^(V, e), plays an important role in specification of 
the (n, e)-optimal rate region. This set is diagrammed in two dimensions (for ease of visualization) in Fig. [T] We 
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Fig. 1. The boundaries of the region --i=^(V, e) for different values n, e and V. On the left plot, V = [1 0.01; 0.01 1] (small condition 
number) and on the right, V = [1 0.96; 0.96 1] (large condition number). The regions -^^(V,e) lie to the top right comer of the 
boundaries. oS^(V, e) defined in (|6l( is a subset of R'^ but in the figures, we only illustrate the projection of the set in two dimensions. 



note that the boundaries are indeed curved due to the fact that V ^ 0. Note that as n increases to infinity or e 
increases towards 1/2, the boundaries are translated closer to the horizontal and vertical axes. Also observe that as 
the condition numbeiQ V increases, i.e., V tends towards being singular, the corners of the curves become "sharper" 
(or "less rounded"). Indeed, in the limiting case when V has rank one, the support of p(u) = A/'(u; 0, V) belongs 
to a subspace of dimension one. In this case, the set o5^(V, e) is an axis-aligned, unbounded rectangle (a cuboid in 
higher dimensions). See further discussions in Section III-B2I 



Definition 3. The entropy density vector is defined as 

"-logPXi|X2(^l|^2)" 

h(Xi,X2):= -\ogpx,\xSX2\Xi) 

-\ogpX^,X2{Xl,X2)_ 

The mean of the entropy density vector is the vector of entropies, i.e., 

^H{Xi\X2) 



(7) 



(8) 



H{X2\Xi) 
H{Xi,X2) 

Definition 4. The entropy dispersion matrix V(pxi,X2) the covariance matrix of the random vector h.{Xi,X2) 
i.e., 

V(px„xJ = Cov(h(Xi,X2)). (9) 

We abbreviate the deterministic quantities H(pxi,X2) ^ ^rid ^{pxi^x^) ^ as H and V respectively. Observe 
that V is an analogue of the scalar dispersion quantities that have gained attention in recent years ||9l, |[2TI - |[23l . 
We will find it convenient, in this and following sections, to define the non-negative rate vector R G as 

Ri 

R2 
Rl + i?2 



R 



(10) 



Definition 5. Define the region 



i{n, e) C to be the set of rate pairs {Ri, R2) that satisfy 

REH + ^^(V,e) + ^l, 
n n 



(11) 



^Recall that the condition number of V is the ratio of its maximum to minimum eigenvalues, i.e., cond(V) = Aniax(V)/Amin(V). 
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where v := \Xi\\X2\ + 1- Also define the region [%o^t{n,e) C M to be the set of rate pairs (i?i,i?2) that satisfy 

1 lo£ Ti 

RGH + — ^(V,e)--^1. (12) 

\/n n 



An illustration of the regions, disregarding the 0(^^^) factors, is provided in Fig. |2] Also see Fig. |3]for how 
the regions vary with n and e. 

B. Main Result and Interpretation 

Theorem 1 (Finite Blocklength Slepian-Wolf Region). Let e G (0,1). The {n,e)-optimal rate region ^g^(n, e) 
satisfies 

^in(n,e) C ^sw{n,e) C ^out(n,e) (13) 

for all n sufficiently large. 

The proof is provided in Section IVIII-BI Several aspects of the region are studied numerically in Section I VI- A I 

1 ) Discussion ofTheorem\l\ The direct part of Theorem[T]is proved using the usual random binning argument [7 1, 
B2II together with a multidimensional Berry -Esseen theorem lITTl . The latter allows us to prove an important vector 
rate redundancy theorem (Theorem [5]l. This theorem is a recurring proof technique — it is also used to prove the 
direct parts of the analogous results for the multiple-access and broadcast channels. The decoder is a modification 
of a minimum empirical entropy |[T4l decoding rule. More precisely, we require the three empirical entropies 
H{X^\Xo), H{X2\Xf) and i?(X",X2 ) to be jointly smaller than some perturbed rate vector R — where 

= 0(-2^). By Taylor's theorem, it can be seen that the empirical entropy vector behaves like a multivariate 
Gaussian with mean H and covariance V, explaining the presence of these terms in (fTTl) and ([T2l) . The converse 
is proved by leveraging on an information spectrum theorem by Han H3l . Theorem [T] extends naturally to the case 
where there are more than two senders. 

By examining ^in(n,e) and =^out("';e)> it can be seen that we have characterized the (n, e)-optimal rate region 
up to an 0(^2£I!:) factor. This residual is a consequence of (i) loss in universal decoding for the direct part and 
(ii) the residual (approximation) terms resulting from the use of the multidimensional BeiTy-Esseen theorem |[T7l 
to approximate the distribution of a sum of i.i.d. random vectors with a multivariate Gaussian. In Section IVIII-B3I 
we suggest a maximum-likelihood-based |T5|, non-universal decoding rule (i.e., one that assumes knowledge of 
the source statistics px^Xz) which the symmetry between the direct and converse parts become apparent. Also 
see the symmetry in 1431 Lemmas 7.2.1-2]. 

Roughly speaking, the (n, e)-rate region approaches the SW region |7| at a rate of 0(-^). This follows from 
the multidimensional central limit theorem. The redundancy of the SW problem where there are two sources to be 
coded and estimated is characterized by the set -^^(V, e) shown in Fig. [T] The boundary of this set represents 
the three rates needed to be added on to the entropies H{Xi\X2), H{X2\Xi) and H{Xi,X2) if one desires to 
operate in the finite blocklength setting. This redundancy set -^S^(V, e) also governs the interaction among the 
two senders. Somewhat unexpectedly, if V ;^ 0, which is true for almost all finite alphabet sourcesJl the (n, e)-rate 
region is not-polyhedral. Note that the SW region =^g^, given in Q, is polyhedral. 

2) Singular Entropy Dispersion Matrices: What are the implications of the (n, e)-SW region for singular V's? 
Note that Theorem [T] holds regardless of whether V is singular or positive-definite (but not for the trivial case 
where V = so we assume throughout that rank(V) > 1). Sources for which V is singular include those which 
are (i) independent, i.e., I{Xi;X2) = 0, (ii) either Xi or X2 is uniform over their alphabets. It is easy to see 
why I{Xi;X2) = results in a singular V — this is because the third entry in the entropy density vector is a 
linear combination of the first two. Thus V loses rank. Case (ii) was analyzed by Sarvotham et al. ll38l where 
Xi,X2 G F2, Xi ~ Bern(i), X2 = Xi®N with N ~ Bern(a),a G (0, i). The pair of random variables (Xi, X2) 
is the so-called discrete symmetric binary source (DSBS) with crossover probability a. For the DSBS, Theorem 1 
in ll38l asserts that the (n, e)-optimal rate region is (up to terms in o{l/^/n)) 

R>H+ J-5^Q-i(e)l, (14) 
V n 



^If the entries of the matrix pxi,X2 (a^i, 2^2) are chosen according to a distribution which is absolutely continuous with respect to the 
Lebesgue measure and subsequently normalized to sum to unity, then P(rank(V(pxi,X2) = 3) = 1. 
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where Va is a scalar entropy dispersion [to be specified precisely in (fT5])1. Thus, the three inequalities are decoupled. 
In contrast, in Theorem [T] we showed that the (n, e) -optimal rate region for general DMMSes is such that the 
constraints on Ri, R2 and Ri + R2 are coupled through the set ^(V, e). 

Let us relate ([T4l) to our Theorem[T] For the DSBS, it can be verified that rank(V) = 1 and that V is a scalar multi- 
ple of the all ones matrix, i.e., V = yolsxs where 14 = Var(- logpj^^|jf^(Xi|X2)) =\/ar{-logpx^\x^iX2\Xi)) = 
Var(— logpxi,X2{Xij X2))- Intuitively, this is because there is only one degree of freedom in a DSBS with crossover 
probability a. The parameter Va is exactly the scalar dispersion in ([141 ). In fact, it can be calculated in closed-form 
for the DSBS with crossover probability a as 

(15) 

For this source, since rank(V) = 1, all the probability mass of the degenerate Gaussian J\f{0, V) lies in a subspace 
of dimension one. Therefore, it is easy to see that ^(V, e) defined in ^ degenerates to the axis-aligned cuboid 

y{Y, e) = {z G : z > ^/Va Q'H^)^} ■ (16) 

The quantity ^yVa/nQ'^{e) is the rate redundancy |[35l - |[38l for fixed-length SW coding in the finite blocklength 
regime for a DSBS. In this case, with o5^(V, e) as in ([T6l ). the inner and outer bounds of the (n, e)-optimal rate region 
degenerate to ([141 ). Thus the fixed-length results in i35l -ll38l are special cases of our general result. This argument 
for singular dispersion matrices can be formalized and we do so in the latter half of the proof of Theorem [5] 

III. Dispersion of the Multiple-Access Channel 

The multiple-access channel or MAC is the channel coding dual to the Slepian-Wolf problem described in 
Section ini 1 14, Sec. 3.2]. The MAC model has found numerous applications, especially in wireless communications 
where multiple parities would like to communicate to a single base station reliably. For a MAC, there are two (or 
more) independent messages Ml G [2"^'] and Ah G [2"-^^]. The two messages, which are uniformly distributed over 
their respective message sets, are separately encoded into sequence codewords G and X2 G ^2 respectively. 
These codewords are the inputs to a discrete memoryless multiple-access channel (DM-MAC) W : Xi x X2 ^ 
The decoder receives from the output of the DM-MAC and provides estimates of the messages Mi and M2 or 
declares that a decoding error has occurred. It is usually desired to send both messages reliably, that is, to ensure 
that the average probability of error 

Pi") := P({Mi / Ml} U {M2 / M2}) (17) 

tends to zero as n — )• 00. The set of achievable rates, or the capacity region ^^j^q, is given by 

Ri<I{Xr,Y\X2,Q) 
R2<I{X2;Y\Xi,Q) 
Ri + R2<I{Xi,X2;Y\Q) (18) 

for some pq, PXi\Q ^nd PXslQ ^ii'l \ Q\ ^2. This asymptotic result was proved independently by Ahlswede |l2l and 
Liao IS and can be written in an alternative form which involves taking the convex hull instead of the introduction 
of the auxiliary time-sharing variable Q. See [l] for further discussions. A somewhat surprising result in the theory 
of MACs, which differs from point-to-point channel coding, is that the capacity region for average probability of 
error is strictly larger than that for maximal probability of error [44 1. We emphasize that we focus on the average 
probability of error defined in ([TTt throughout. Note that as with the SW case, we can consider P(Mi 7^ Mi), 
P(M2 7^ M2) and Pi") separately and place upper bounds on each of these constituent error probabilities but, for 
simplicity, we consider only Pi") in ([TTl ). 

In this section, we ask what the finite blocklength analogue of the region in ([TSl ) is. More precisely, we fix a 
blocklength n and a tolerable average error probability e. We then attempt to find the set of achievable rate pairs 
as a function of (n,e). We call this the (n, e)-capacity region and denote it by '^j^^^(n, e). Our main result in 
this section is the derivation of an inner bound to '^j^^(j(n, e). In other words, we propose a coding and decoding 



ail 



a 



log 



a 



a 



9 



procedure over a finite block of symbols of length n that satisfies Pe < e. Our coding scheme is the coded time- 
sharing procedure by Han and Kobayashi [45 1. The decoding scheme is similar to MMI decoding iflSl . However, 
the error probability analysis is rather different. 

As with the SW region, note that the capacity region in ( fTSl ) is a polyhedron (in fact a pentagon) for a given set of 
time-sharing and input distributions pq, PXi\Q ^^d PXalQ- While we currently do not have a full characterization of 
the (n, e)-capacity region and proving a converse appears to be difficult, we remark that, somewhat surprisingly, the 
boundary of the inner bound for fixed pg, PXi\Q ^nd Px2\Q is curved if the so-called information dispersion matrix, 
to be defined in (l24l ). is non-singular This is likened to the (n, e)-SW region (Theorem [D and the information 
dispersion matrix is analogous to the entropy dispersion matrix defined in ^ for the SW problem. As in the SW 
case, the main tool that we use is a multidimensional version of the Berry-Esseen theorem fTT'l and the vector rate 
redundancy theorem (Theorem |5]l. We focus on the two-sender case. Generalizations to more than two senders are 
straightforward. We begin with definitions then state the main result. We also provide some hints on how a possible 
converse theorem could be proved after the main achievability result is stated. 



A. Definitions 

Let {Xi,X2, W,y) be a DM-MAC, i.e., for any input codeword sequences x" € and X2 G X2, 



(19) 



k=l 



Definition 6. An (n, 2""i, 2"-"% e)-code for the DM-MAC {Xi,X2, W,y) consists of two encoders fj.n : Mj = 
[2"^^] — Xj'jj = 1,2, and a decoder : y" Aii x ^A2 such that the average error probability defined in 
(1171) does not exceed e. Note that the outputs of the encoders are fj^n{Mj),j = 1,2 and the output of the decoder 
are the estimates (Mi, M2) = V'nC^")- The coding rates are defined in the usual way as in ([Sjl. 

Definition 7. A rate pair (i?i,i?2) is (n, e)-achievable if there exists an {n,2^^^ ,2"^^^ ,e)-code for the DM-MAC 
{Xi, X2, W,y). The (n, e)-capacity region ^mac("'' ^) ^^^^ of all {n, e)-achievable rate pairs. 

Again traditionally, capacity regions are defined with an additional closure operation 11]. However, as we are 
analyzing the finite blocklength regime, we refrain from doing so. Also, in contrast to the asymptotic setting, it is not 
obvious that ^mac("' ^) convex. The usual Time Sharing argument 1 14, Lemma 3.2.2] — that the juxtaposition 
of two good multiple-access codes leads to a good but longer code — does not hold because the blocklength is 
constrained to be a fixed integer n so juxtaposition is not allowed. Fix a triple of distributions pgiq), PXi\q{xi\q) 
and Px2\q{^2\q)- Given the channel W, these distributions induce the following output conditional distributions 



PY\X2,Qiy\x2,q) ■■= ^Px^\Q{xi\q)W{y\xi,X2) 

Xi 

PY\Q{y\(l) ■■= ^PX^\Q{Xl\q)pX2\Q{x2\q)W{y\Xl,X2). 



(20) 
(21) 



Xl,X2 



The output conditional distribution Py\Xx,q is defined similarly to Py\X2,Q "^^^^ 1 replaced by 2 and vice versa. 
Definition 8. The information density vector is defined as 

-\og[W{Y\Xi,X2)/pY\x2,Q{Y\X2,Q)]' 



i(Q,Xi,X2,y) 



(22) 



\og[W{Y\Xi,X2)/pY\x,,Q{Y\Xi,Q)] 
\og[W{Y\XuX2)/pY\Q{Y\Q)] 

where the distributions Py\X2,QiPy\Xi,q '^^d Py\q are defined in ( |20| )- (l21b . The random variables ((5,Xi,X2,y) 
have joint distribution PqPXi\qPX2\Q^ ■ 

Observe that the expectation of the information density vector with respect to PqPXi\qPX2\Q^ is the vector of 
mutual information quantities in dTS] ). i.e.. 



E[i(Q,Xi,X2,y)] 



Px,\Q,Px2\Q^W) :-- 



IiXi;Y\X2,Q) 
I{X2;Y\Xi,Q) 
IiXi,X2;Y\Q) 



(23) 



10 



Definition 9. The information dispersion matrix ^{pq,PXi\q^PX2\Q^^) covariance matrix of the random 

vector i{Q, Xi, X2,Y) i.e., 

^iPQ,Px,\Q,Px,\Q,W) = Cov(i(Q,Xi,X2,y)). (24) 

If there is no risk of confusion, we abbreviate the deterministic vector 1{pq,PXi\QjPX2\Qj^) ^ ^^'^ the 
deterministic matrix 'V{pQ,px-^\Q,Px2\Q^ ^) ^ as I and V respectively. We assume throughout that the channel 
and the input distributions are such that rank(V) > 1, i.e., V is not the all-zeros matrix. Recall the definition of 
the rate vector R = , i?2 , i?i + i?2]^ in ( fTOl ). 

Definition 10. Given triple of input distributions {pq-,PXi\q-,PX2\q)' define the region ^{n,e]pQ,px^\Q,Px2\Q) 
M? to be the set of rate pairs {Ri, R2) that satisfy 

ReI-^^(V,e)-^l, (25) 

\/n n 



where v := \Q\\Xx\\X2\\y\ + 1. Here I := \{pQ,Px^\Q,PX2\Q^^) and V := ^{pQ,Px^\Q,PX2\Q^W) and the set 
=^(V, e) C is defined in ©. 



B. Main Result and Interpretation 

Tlieorem 2 (Inner Bound to Finite Blocklength DM-MAC Capacity Region). Let e G (0, 1). The (n, e)-capacity 
region '^MAci''^' ^) f^'^ DM-MAC satisfies 

IJ ^in,e;pQ, Px,\Q, Px2\q) ^'^MAcin,€) (26) 

PQ.PxiiQ.PxalQ 

for all n sufficiently large. Furthermore, the union over pq can be restricted to those discrete distributions with 
support Q whose cardinality \Q\ < 9. 

This theorem is proved in Section [VIII-CI The bounds on cardinality can be proved using the support lemma [041 
Theorem 3.4]. See Section rVIII-C2l The inner bound is illustrated for various n's and e's in Fig. [H It is relatively 
straightforward to extend the result to the case where there is a cost constraint on the codewords, i.e., 

1 " 

- J]]Aj(a;jfc(mj)) < r^-, (27) 
k=l 

for J = 1,2 and all (mi, 711,2) G Mi x M.2. We omit the statement and proof. In Section rVIII-C3[ we comment 
on how the coding scheme can be modified to deal with channels with arbitrary input and output alphabets at the 
cost of universality in decoding. 

From (I25] ). we see that the inner bound to the (n, e)-capacity region 'io^p^Q{n,e) approaches the usual MAC 
region (fTSl ) at a rate of 0{^) for fixed input distributions. Unsurprisingly, this rate is a consequence of the 
multidimensional central limit theorem. The redundancy set -^^yiy^ e) which manifests itself in (1251 ) is exactly 
the loss in rate to the three mutual information quantities in ([T8] ) one must incur when operating in the finite 
blocklength setting with average error probability e. 

For the proof of Theorem |2l we use the coded time-sharing scheme introduced by Han and Kobayashi in their 
seminal work on interference channels [45 1. The decoding step, however, is novel and is a modification of the 
maximum mutual information (MMI) decoding rule lfT4l . ifTSll . This MMI-decoding step allows us to define a new 
notion of typicality for empirical mutual information quantities. Interestingly, the error event that contributes to the 
e probability of error is the one in which the transmitted pair of codewords x"(mi), X2 ("12) is not jointly typical (in 
a refined sense of typicality) with the output of the channel y" (and a time-sharing sequence g"). The probabilities 
of the other error events — that there exists another codeword jointly typical with the output — can be shown to 
be vanishingly small relative to e. Intuitively, this is because we are operating close to the boundaries of the rate 
region for given input distributions, i.e., at very high rates. The sphere -packing argument lfT4l . |[33l . H6l implies 
that the dominant (typical) error events at high rates are of the form where a large number of incorrect codewords 
are jointly typical with the transmitted one, i.e., what Forney calls Type I error |46]. Thus, the probability of error 
is dominated by an atypically large noise event and expurgation does not improve the exponents. 
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A converse (outer bound to '^mac('^' unfortunately remained elusive. To the best of the authors' knowl- 

edge, there are three strong converse proof techniques for the average probability of error of the DM-MAC. The 
first is by Han f43^, Lemma 7.10.2] [47, Lemma 4] and is based on information spectrum ideas. Applying it is 
difficult because the specified input distributions px^ , Px^ are the Fano-distributions on the codewordso Since px^ 
does not decompose into independent factors, the Berry-Esseen theorem is not directly applicable. The second is 
by Dueck f4E] who used the blowing-up lemma lfT4l Sec. 1.5]. The third and most promising technique is by 
Ahlswede [49 1 who built on Dueck's work \4E\. Ahlswede first applies Augustin's strong converse for DMCs fSQl 
to the so-called Fano*-distributior0 which factorizes. Then, he obtains a region that resembles the capacity region 
for the DM-MAC. Finally, he utilizes a wringing technique to remove (or wring out) the dependence between 
Xi and X2. Unfortunately, it appears that the use of both the blowing-up lemma and the wringing technique 
results in estimates of an outer bound that are too loose to match the 0{-^) dispersion term in the inner bound 
in Theorem |2l Another major obstacle to proving a finite blocklength-style converse is the need to introduce the 
time-sharing variable Q or the convex hull operation judiciously. Hence, we believe that genuinely new strong 
converse techniques for the DM-MAC (and other multi-user problems) have to be developed to prove a tight outer 
bound that matches (or approximately matches) our inner bound. 



IV. Dispersion of the Asymmetric Broadcast Channel 

We now turn our attention to the broadcast channel [5], which is another fundamental problem in network 
information theory. Despite more than 40 years of research, the capacity region has resisted attempts at proof. One 
special instance in which the capacity is known is the so-called asymmetric broadcast channel or ABC H, |[T4l . 
The ABC is also known as the broadcast channel with degraded message sets. In fact, the analysis in this section 
applies to the general broadcast channel. We focus on the ABC for concreteness. 

In the ABC problem, there are two independent messages Ml G [2"^i] and Ms S [2"^^] at the sender. These 
two messages, which are uniformly distributed over their respective message sets, are encoded into a codeword 
X" G A'". These codewords are then the inputs to a discrete memory less asymmetric broadcast channel (DM- ABC) 
ly : Af — )• 3^1 X 3^2- Decoder 1 receives y" and estimates both messages Mi and M2, while decoder 2 receives 
Y2 and estimates only M2. Let the estimates of the messages at decoder 1 be denoted at (Mi,M2) and let the 
estimate of message 2 at decoder 2 be denoted as M2. The average error probability is defined as 

Pi") := P({Mi / Ml} U {M2 / M2} U {M2 / M2}), (28) 

Note that the error error event above corresponds to receiver 1 not decoding either message correctly or receiver 2 
not decoding her intended message M2 correctly. An alternative formulation, which turns out to be more challenging, 
would be to define average probabilities of error for receiver 1 and receiver 2 and to put different upper bounds on 
these error probabilities. 

Returning to our setup, it usually is desired to drive Pi"\ defined in (1281 ). to zero as the blocklength n — )• 00. 
The set of achievable rate pairs (i?i,ii2) first derived by Korner and Marton [4| is then given by the region 

Ri < I{X;Yi\U) 
R2<I{U;Y2) 

Ri + R2<I{X;Yi) (29) 

for some pi/^x{u, x) where < \X\ + 1, i.e., U — X — (11,12) form a Markov chain in that order. The proof for 
the direct part uses the superposition coding technique [5|. The auxiliary variable U basically plays the role of the 
cloud center while the input random variable X plays the role of a satellite codeword centered at the cloud center 
U. A weak converse can be proved using the Csiszar-sum-identity [1]. For a strong converse, see llT4l Sec 3.3] or 
the original work by Korner and Marton |4|. 

In this section, we again depart from the traditional asymptotic setting. More specifically, we fix a (finite) 
blocklength n and a tolerable upper bound on the average error probability in (|28] ). say e. We attempt to characterize 
the so-called (n, e)-capacity region 'if^3Q(n, e), i.e., the set of (n, e)-achievable rate pairs (i?i,P2) for the ABC 

^Given DM-MAC codebooks Cj := {x^{mj) : rrij G Mj},j = 1, 2, the Fano-distribution pxv- is the uniform distribution over Cj. 
*The Fano* -distribution is px" = nfc=i PXj^ with pxj,, (a) |A^j |{mj : Xjkijrij) — a}\ for all a £ Xj. 
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W. As with the DM-MAC, a rate pair (i?i,i?2) is said to be (n, e)-achievable for the ABC W if there exists a 

(n) 

code, i.e., an encoder and two decoders operating on blocks of symbols of length n for which Pc < e. Precise 
definitions will be provided in Section IIV-AI 

We show in this section that the tools we have developed for SW coding and the MAC, such as the vector rate 
redundancy theorem, are versatile enough for us to provide an inner bound to "^^3(^(71,, e). Our coding scheme 
is based on superposition coding (5] but the analysis is somewhat different and uses a variant of MMI-decoding. 
Like the DM-MAC, all three inequalities that characterize the capacity region in (|29l ) are "coupled" through an 
information dispersion matrix for a given input distribution pu,x- Thus, the main result in this section is conceptually 
very similar to that for the DM-MAC. And as with the DM-MAC, we do not yet have an outer bound for this 
problem but we note that strong converses for this problem are available lISTI . We start with relevant definitions. 



A. Definitions 

Let {X ,W,yi,y2) be a 2-receiver DM-ABC. That is given an input codeword sequence G Af", 



(30) 



fc=i 



We will use the notations Wi and W2 to denote the yi- and 3^2-inarginal of W respectively, i.e., Wi{yi\x) := 
^yi ^(yi^y^lx) and similarly for W2{y2\x). 

Definition 11. An (n, 2"^i, 2"^% e)-code for the DM-ABC (Af, VF, 3^2) consists of one encoder fn : M1XM2 = 
[2"^^] X [2"-^^] —?- X'^, and two decoders ipi^n '■ 3^" — ^ -M-i x -A^2 and (f2,n ■ 3^2 ~^ -^2 ■^"c/j that the average 
error probability defined in ( 1281 ) does not exceed e. Note that the output of the encoder is fn{Mi, M2) and the 
output of the decoders are the estimates (Mi, M2) = (pi^niYi) cind M2 = <^2,n(^2")- coding rates are defined 
in the usual way as in 

Definition 12. A rate pair (i?i,i?2) is (n, e)-achievable if there exists an (n, 2"^^, 2"^% e)-coJe/or the DM-ABC 
{X ,W,yi,y2)- The (n, e)-capacity region '^^g(-,(n, e) C is the set of all {n,e) -achievable rate pairs. 

Fix an input distribution pu,x ^ -"^{U >^ X) where the auxiliary random variable U takes values on some finite 
set lA. Given the channel W and input distribution pu,x, the following distributions are defined as: 



Py,\u{V3W) = ^Wj{yj\x)px\u{Au), 

X 

PyAvj) = ^Wj{yj\x)px{x), j = 1,2. 

X 

Definition 13. The information density vector for the ABC is defined as 

-log[WiiYi\X)/py^^uiYi\U)] 



(31) 
(32) 



i{U,X,Yi,Y2 



log[py,|c/(l2|f/)/py.(l"2)] 
log[Wi{Yi\Xi)/pY,{Yi)] 



(33) 



where the distributions PYi\u ^ PYi , Py2\u ^ PY2 o^f^ defined in ([3T] | and (l32l l respectively. The random variables 
{U,X,Yi,Y2) have joint distribution pu,xW. 

Observe that the expectation of the information density vector with respect to pu,xW is the vector of mutual 
information quantities, i.e.. 



E[i{U,X,Yi,Y2)] = I{pu,x,W) :-- 



'l{X;Yi\U)' 
I{U;Y2) 
I{X;Yi) 



(34) 



Definition 14. The information dispersion matrix 'V{pu,x,W) is the covariance matrix of the random vector 
i{U,X,Yi,Y2) i.e., 

^ipu,x,W) = Coy{i{U,X,YuY2)). (35) 
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As with the SW and MAC cases, we usually abbreviate I{pu^x,W) and 'V{pu,x,W) as I and V respectively. 
We will again use the definition of the rate vector R = [Ri,R2, Ri + i?2]"^ in (flOl ). 

Definition 15. Given an input distribution pu,x, define the region M{n,e;pu,x) C to be the set of rate pairs 
, i?2 ) that satisfy 

ReI-^^(V,e)-^l, (36) 

Jn 11 



where v := \K\\X\ max{|3^i|, |3^2|} + 1- Here I := I{pu,x, W) and V := Y{pu,x,W) and the set ,y(y, e) C 
is defined in Q. 



B. Main Result and Interpretation 

Theorem 3 (Inner Bound to Finite Blocklength DM-ABC Capacity Region). Let e G (0, 1). The (n, e)-capacity 
region ^Xbc('^' ^) fa*^ DM-ABC satisfies 

\J ^(n,e;p[/,x) C^lBc(^>e) (37) 

Pu,Px\u 

for all n sufficiently large. Furthermore, the union over pjj can be restricted to those discrete distributions with 
support U whose cardinality \hl\ < \X\ +6. 

The proof of this result can be found in Section IVIII-DI 

Conceptually, this result is very similar to that for the DM-MAC. The reason for its inclusion in this paper is to 
demonstrate that the proof techniques we have developed here are general and widely applicable to many network 
information theory problems, including problems whose capacity regions involve auxiliary random variables. Indeed, 
it is easy to apply our vector rate redundancy theorem to prove a finite blocklength analogue of the inner bound 
to the (n, e)-capacity region of the discrete memoryless interference channel (DM-IC) [ 451 . ||52l . However, for the 
DM-IC, there is an additional Fourier-Motzkin step to eliminate the common and private message rates. This step 
can indeed be done for deriving the finite blocklength inner bound. 

Loosely speaking, for the DM-ABC and for a fixed pu,x, the inner bound approaches to the capacity region in 
(|29l ) at a rate of 0{-^) as prescribed by the multidimensional central limit theorem. The redundancy set is, as per 

the DM-MAC, J^(V, e). It would be interesting to extend the above result to derive a finite blocklength version 
of Marton's inner bound ll53l . |I54|, which is the best (largest) inner bound for the broadcast channel. This problem, 
however, appears to be rather challenging because of the need to generalize the mutual covering lemma ID. 



V. Dispersion Along Slices of the (n, e)-RATE Regions 

While we have a single-letter characterization of the (n, e)-optimal rate region for SW coding and inner bounds 
for the (n, e)-capacity regions for the DM-MAC and DM-ABC problems, it is instructive to study the behavior 
along certain "slices" of the region. We would also like to compute, if possible, the dispersion (second-order coding 
rate) as a rate pair converges to a particular point on the boundaries of the asymptotic regions. To be concrete, we 
will focus solely on the SW setting (Theorem [D but we note that similar conclusions can be made for the inner 
bound to the (n, e)-capacity region for the DM-MAC for fixed input distributions (Theorem |2]|. We start by defining 
the set 

J'sw(n,e) := |(-Ri,i?2) G : R e H + -^^(V,e)| . (38) 

Observe that i^sw('T'; ^) is essentially the same as ^g-\^{n, e) but the former ignores the O(^) terms in the inner 
and outer bounds of Theorem[T] The set in (1381 ) is shown in Fig.[2l Assume for the rest of this section that the entropy 
dispersion matrix V(pxi,X2) so in particular, conditioning strictly reduces entropy, i.e., H{Xi\X2) < H{Xi). 
This ensures that the SW region is non-degenerate. 

As we see in Fig. |2l we can approach a point on the asymptotic SW boundary, denoted as =^gw' frorn a number 
of different directions. We formalize this as follows: Let 9 be an angle of approach towards a point {R\, R2) lying 
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i?2 



H2 

H2\l 



it. 



■Q"'(e)+r„ 



Ri 





{Ri (n, e), R2{n, e)) 



Hi\2 Hi 
(a) 

Fig. 2. Plots of the approximation to the (n, e)-optimal rate region ^%3w{n, e) defined in ( 138b and the asymptotic SW region in ([3} whose 
boundary is indicated by ,^sw- We use the simplified notation Hi := H{Xi),H2 ~ H{X2),Hi\2 ~ H{Xi\X2), H2\i ~ H{X2\Xi) 
and Hi^2 = H{Xi,X2)- The directions of approach are indicated by the arrows in the different subplots. In subplot (a), we approach the 
vertical boundary. The dispersion F is given in i42l . In subplot (b), we approach the sum rate boundary. The dispersion F is given in l l44t . In 
subplot (c), we approach the comer point (i/i,i?2|i)- The dispersion F is given implicitly in l |46l l. The point {Ri{n, e), R2{n, e)) (defined 
in l l39t and l l40t ) and the point target point [Rl, R2) are also indicated in (c). 



on the SW boundary ^g^- We use a single dispersion-like quantity F = F{6, e) to parametrize the approach of a 
pair of rates towards (i?J,i?2). More precisely, define 

Ri{n,e):=Rl + J-{cose)Q-\e) (39) 
V n 



i?2(n,e) := R*2 + \ -{sme)Q-\e) 
V n 



(40) 



In other words, (i?i(n, e),R2{n, e)) is a rate pair on the boundary of the region =^sw(^5 e) that approaches iJg) 
at a rate governed by the angle of approach 9, the error probability in the form of the Gaussian approximation 
term Q^^(e), and the effective dispersion F. The following proposition leverages on Theorem [T] to quantify the 
minimum F achieving e probability of error for various {R^, R2,6). To state our result cleanly, we define 

'^{p;x ,y) := / / exp <^ — \dydx, (41) 

27r^l - p2 J^, Jy, [ 2{l-p^) J 

as the bivariate generalization of the Q-function. 

Proposition 4 (Dispersion Along Slices of (n, e)-SW Region). Let Tn be an exponentially decaying sequenc^ that 
may change from line to line. There are five different cases. First, assume that (R^jR^) is not a corner point. In 
particular, let R^ = H{Xi\X2) and let i?2 > H{X2) (vertical boundary). Then, 



COS^ ( 



+ Tn. (42) 



Similarly if R2 = H{X2\Xi) and R^ > H{Xi) (horizontal boundary) then. 



sin^ I 



+ Tn. (43) 



Similarly if R\ + R\ = H{Xi, X2), R* > H{Xi\X2) and R2 > H{X2\Xi) (sum rate boundary) then, 

^=1 1"^''' f>^2 +^n. (44) 

(cos U + sm t^j^ 

Now assume that (ii*, R2) is a corner point. In particular, let i?* = H(Xi\X2) and i?2 = H{X2). Then F is the 
solution to 



* (^Pi,3;-y|^ (cos 0)Q-i(e),-^^^ (cos + sine) Q-i(e)j =l-e + r„, (45) 
^The sequence {r„}^i is exponentially decaying if limsup„^^ ^ logr„ < 0. 
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where pi^^ := [V]i^3/-y/[V]i^i[V]3^3 is the correlation coefficient of the random variables — log 1X2(^1 1 -^^2) 
and — logpxi.Xal^i) ^2)- Finally, if Ri = H{Xi) and R\ = H{X2\Xi), then F is the solution to 



^ (/'2,3;-JTT^(sm0)Q-i(e),-./^;r^ (cose + sine) Q-i(e) ) = 1 - e + t„, (46) 




, [V]2,2 

This proposition is proved in Appendix El Intuitively, for (|42]) - (|44l) . two (our of three) of probabiUties of the 
constituent events that comprise {Z < -y/n(R(n,e) — H)} decay exponentially fast since we are operating "far 
from" the corresponding constraints. For (l45l) and (l46l ). only one such probability decays exponentially fast so two 
constraints are still active. 

The interpretations of (l42l ) to (l44l ) are fairly clear. The dispersion-like quantity F depends on the angle of 
approach to the boundary 9 and the corresponding entry in the diagonal of the entropy dispersion matrix. So for 
example, if we are approaching the vertical boundary as in Fig. Ufa), then substituting the result (l42l ) into (|39l ) and 
( |40b yields 



Ri{n,e) = H{Xi\X2) + \l^^^q~\e) + Tn (47) 



R2{n,e) = H{X2) + xl^^^ {isii9)q-\e) + Tn, (48) 
V n 

which is simply a scalar dispersion result. If instead we are approaching either one of the comer points, say 
{H {Xi) , H {X2\Xi)) , at an angle 9 as in Fig. Oc), then the situation is much more complicated and F is given as 
in (l46l) . Intuitively, this is because there are several forces at play — the contribution from the marginal dispersion 
[V]2,2> the contribution from the sum rate dispersion [Vjs 3 and also their correlation coefficient p2,3- These interact 
to give an effective dispersion that can only be expressed implicitly in the form shown in ( |46l ). Note that now F 
depends on the angle of approach and the correlation coefficient of — log pxalXi (-^21-^^1) and — log .Xa (-'^i > -^^2 ) 
namely /32,3- Interestingly, F also depends on e unlike the corresponding scalar results in (I42l)-(l44l). We will verify 
this numerically in Section rVI-A3l 

We now interpret Proposition |4] operationally from the perspective of another source coding problem. Consider 
the (n, e)-region for lossless source coding with side information at encoders and decoder (SI-ED), also known as 
cooperative source coding. Specifically, first consider the problem of source coding Xi with X2 available as (full 
non-coded) side information at the encoder and the decoder. Second, we swap the roles of Xi and X2. Third, we 
consider a source coding problem for the pair {Xi,X2). Up to 0(i^^) terms, this region -^si-edC^'^) 
the set of rate pairs (i?i,i?2) satisfying the three scalar constraints 



i?i>//(Xi|X2) + 'y/l^Q-^(e) 
" n 



i?2>//(X2|Xi) + ^/l^Q-i(e) 

n 

[vkr^-i, 



Ri + R2>H{Xi,X2) + \'-^q-\e). (49) 



The three decoupled constraints in (1491 ) represent three single-user simplifications of the problem and therefore 
are three crude outer bounds to ,^3^(71, e). The first two inequalities characterizing the region in (|49l ) can be 
derived in a straightforward manner using a side information (conditional) version of Strassen's original result llT9l 
for hypothesis testing. The last inequality is simply one of Strassen's original results on source coding. Also see 
Problem 1.1.8 in Csiszar and Korner |14| and Theorem 1 in Kontoyiannis [27 1. Our Theorem [T] says that this scalar 
perspective on dispersion is insufficient for multi-user problems. However, away from corner points. Proposition |4] 
asserts that SW coding is very close to the boundaries of the (n, e)-SI-ED region ^gj_gj-,(n, e). For example, we 
see from (|47] ). which is consequence of ((42]) . that the rate i?i for SW coding and SI-ED coding are the same up 
to an exponentially decaying term, which is subsumed by the Berry-Esseen residual term of order 0(^^^). Thus, 
SW coding and SI-ED coding are essentially the same away from the corner points. At the corner points, the story 
is different and more complicated as we observed in (l45l) and (l46l ). 
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Fig. 3. Plots of the SW boundary (dash-dotted black) in {Sjl, (n, e)-SW boundary (solid blue) in dllb . and the (n, e)-SI-ED boundary 
(dashed red) in l |49t for different blocklengths and error probabilities Notice that ^gw(?T-, f) and ^si-ed('^i are rather different near the 
equal rate and corner points when n is small. Plots of ^g^(n, e) and .$?gj_E]3(n, e) as functions of n along the equal rate and corner point 
slices of .^i'si-ED('^i ^) are given in Figs.|4]and|5]respectively. These are indicated by the black A and the green x on the plots. The legends 
apply to all plots. 



VI. Numerical Examples 
In this section, we illustrate the (n, e)-SW region ^g^J^{n, e) derived in Theorem [1] and relate it to error exponent 
analysis |[T5l . We also illustrate the inner bound to the (n, e)-capacity region ^mac("'' ^) MAC. Before 

we begin, we address some computational issues. In order to, for example, graph the (n, e)-optimal rate region 
for the SW problem, we find pairs of rates (i?i,i?2) that lie on the boundary of ■^g^(n, e). To do so, we fixed a 
point Ri and we performed a bisection search for i?2- We also used the Matlab function mvncdf which returns 
the cumulative probability of the multivariate normal distribution with a user-specified covariance matrix V. 



A. Distributed Lossless Source Coding: Slepian-Wolf 

In this section, we use an example to illustrate the (n, e)-SW region ^g^(n, e). We neglect the small 0(^^^ 
terms in ([TT]) and (fT2l) throughout so in fact we will plot ^sw{n,e) defined in (1381 ). The source is taken to be 

"1 - 3a 



(50) 
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where a = 0.1. This source has a positive-definite entropy dispersion matrix V(pxi,X2)- In addition, H{Xi) = 
H{X2) = Fb(2a) and H{Xi\X2) = H{X2\Xi) = (1 - 2a)Fb(i^) + 2a for this px,,x,. 

1) Comparison to Asymptotic SW Region and {n, e)-SI-ED Region: In Fig. [51 we plot the boundary of the SW 
region ^rid the boundary of ^swC"^)^) for different error probabilities e and blocklengths n. We also plot 

the boundary of the (n, e)-SI-ED region defined in ( |49l ). From Fig. [3l we see that since V ^ 0, ^swin^e) has 
a curved boundary, reflecting the correlatedness of Xi and X2 and the non-degeneracy in V. We observe from 
the top plots of Fig. [3] that ^sw{n, e) approaches the asymptotic SW boundary =^g-w as n — )• 00. In addition, the 
bottom plots of Fig. [3] show that ^sw{'>T', approaches ^g^v e — )• 1/2. 

2j Behaviour Along Certain Slices of the Regions: There are several interesting "slices" of the regions shown 
in Fig. |3] Firstly, we are interested in the equal rate slice, which follows along the 45° diagonal line. This is the 
scenario in Fig. |2tb) with 6 = ir/A. Secondly, we are also interested in the slice passing through the origin (0, 0) 
and a corner point of ■^si-ed('^' namely (Ri{n, e), R2{n, e)) defined as follows: 

R2{n,e) := min{i22 : (^1,-^2) S =^si-ed("'5 for some Ri} 

Riin,e) ■.= mm{Ri : i?2(n, e)) G =^^l_ED(n, e)}. (51) 

Note that the latter in (|5T| ) is not the scenario shown in Fig.jJ^c) because we are not approaching {H{Xi), H{X2\Xi)) 
along a straight line of a fixed slope. These two slices are indicated by markers (A, x) in Fig. [3] The sum rates 
along both slices are plotted as functions of n in Figs. |4] and |5] respectively. We observe from Fig. |4] that the two 
sum rates on the 45° equal rate line approach each other as n grows. We computed their difference and noted that 
it decays as exp(— 0(n)). This is corroborated by (l44l ) in Proposition |4] where the difference is denoted by the 
exponentially decaying term r„. This term is dominated by the O(^^f^) residual term in Theorem [U We conclude, 
as we did theoretically in Section |Vl that when n is sufficiently large (say n > 10^ as seen in Fig. lU, there is 
essentially no difference in performing SW coding versus cooperative encoding (SI-ED) if we wish to optimize 
(minimize) the sum rate. The SW and SI-ED dispersions are the same and are equal to [Vja 3. 

On the other hand, from Fig. |5l we see that the corresponding difference in corner points decays at a much slower 
rate of /in~^/^ for some constant ^ > 0. Thus, the corner rate dispersions are different and consequently, if we 
wish to operate in the neighbourhood of a corner point, SW coding loses second-order coding rate relative to the 
cooperative scenario. Intuitively, this is due to the fact that the effective dispersion approaching {H{Xi), H{X2\Xi)) 
is a complicated function of the conditional dispersion [V]2,2> the joint dispersion [Vja^s and their correlation 
coefficient p2,3- We discuss this in greater detail next. 
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Fig. 5. Comparison between the (n, e)-SW and (n, e)-SI-ED corner rates as well as their difference on a log-log plot. See dSlt for definitions. 
In contrast to Fig. [4] note that the horizontal axis is logjQ(n), where n is the blocklength. The difference decays at a rate of 6(n~^^^) so 
the dispersions of SW and SI-ED coding along this corner rate slice are different. 




Fig. 6. Plot of logjQ(_F(&, e)) against 9 £ (0,37r/4) for different e's. This plot shows the "effective dispersion" (or second-order coding 
rate) as we approach the corner point {H{Xi), H{X2\Xi)) from various ang les. See Fig.^c) and the definition of F(e, e) in ^ and 



3 ) Approaching a Corner Point at an Angle 9: To better understand how the dispersion (or second-order coding 
rate) varies as a function of the angle of approach to a corner point, let us consider the setup in Fig. Oc) and the 
definitions of Ri{n, e) and i?2(re, e) in ( [39l ) and ( [401 ) respectively. Recall that this rate pair {Ri{n, e), i?2(n, e)) lies 
on the boundary of ^sw(n.,e) and is parametrized by F, a dispersion-like parameter. For the source defined in 
(|50l ). we solve for F in (|46l ). Hence, we are approaching the cornerpoint {H{Xi), H{X2\Xi)) at an angle 9. Note 
that F is in general a function of 9 and e so we write F = F{9, e)0 In Fig. |6l we plot log^o(-^(^) ^)) ^ function 

^In fact, F is also a function of n as can be seen in Proposition |4] However, it has an exponentially small dependence on n. In Fig. |6] 
we neglect this exponentially small term when computing F{9,e). 
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of 9 € (0, 37r/4) for different e's. It can be seen that as — 0, the second-order coding rate F{6, e) increases. This 
agrees with intuition because when 6 is small, we are approaching the corner point (H{Xi), H{X2\Xi)) almost 
parallel to the horizontal boundary of I^swin, When 9 is close to 37r/4, similarly, we are almost parallel to 
the sum rate boundary. In either of these two cases, we are very close to the boundary of the (n, e)-rate region. 
On the other hand, when 9 is moderate (say 9 Svr/S), the rate pair is further into the interior of ^swi^je), 
hence the effective dispersion is smaller. The constant Svr/S is in fact not arbitrary because the angle between 
the horizontal boundary and the sum rate boundary of ^sw("^;e) is exactly 37r/4. Hence Svr/S is the half-angle, 
which means that the rate pair is furthest away from either boundary. However, the smallest dispersion does not 
occur at exactly 9 = Svr/S because of some asymmetry between the entropy densities —logpx2\Xi{^2\Xi) and 
— logpxi,X2 (-'^i; ^2)- One would also expect F{9, e) to be independent of e [cf. (I42l)-(l44l)1. However, Fig. |6]shows 
that this is not the case. We conclude that the rate of convergence toward a corner point is 0{-^) but the coefficient 
is a complicated function of 9, e and the model parameters in the 2x2 submatrix [V]2:3,2:3- 

4) Comparison to Error Exponent Analysis: In this section, we estimate the required blocklengths to attain an 
error probability e using dispersion analysis and eiTor exponent analysis and we compare these estimates. Before 
doing so, we remind the reader of the error exponent setup and existing results. A rate pair i?2) in the (interior 
of the) SW region is fixed. One then asks how rapidly the error probability in ^ decays as a function of n. We 
attempt to find the error exponent (or reliability function) E{Ri,R2) defined as 



E{Ri,R2) :=limsup -^logPi"), 

n— >oo IT' 



(52) 



(n) 

where here, Pe is the smallest possible error probability of length-n SW block codes with compression rates Ri 
and i?2- Gallager fT5l derived a lower bound to the error exponent under maximum-likelihood decoding (MLD) 
for lossless source coding of Xi with decoder side information X2. This was followed up by Koshelev |16| who 
derived a lower bound to the error exponent for the two-encoder SW problem (which is our setup in Section |ll]l. 
In particular, Koshelev |16] showed that the error probability under MLD can be bounded from above as 



Pi") < 3exp2 



n max mm 

pe[o,i] 



-E'l|2(-Rl> P)> -E'2|l(-R2, P), Ei^2{Rl, R2, P) 



(53) 



where exp2(t) := 2* and the constituent exponents are defined as 



Ei\2{Ri,p) ■■= pRi - log XI 



X2 



1+P 



Ei^2iRi,R2,p) := + i?2) - log 



X] PXi,X2{xi,X2)^+' 



Xl,X2 



1+P 



(54) 



(55) 



The exponent E2\i{R2,p) is similar to £'x|2(Pi,/o) with 1 replaced by 2 and vice versa. Thus, the error exponent 
in (l52l) can be lower bounded as 



E{Ri,R2)>E{Ri,R2) := max min [Si|2(i?i, p), £;2|i(i?2, p), ^1,2(^1, ^2, p)] • 

P6[0,l] 



(56) 



Ri~HiXi\X2) 



(57) 
(58) 



The following facts can be readily verified |[T5l . |[29l and are indeed well-known 

1^^112(^1, P) 
op ' p=0 

-Q^Ei\2{Ri,p) = Var[-logpxi|X2(^i|^2) 

Similar relations hold for the derivatives of £211(^21 p) and £1 2(-Ri, R2, p)- Eq. (ISVl ). together with the analogous 
results for E2\i{R2, p) and £'1 2(-Ri, R2t p), implies that if R2) belongs to the interior of =^gw lower bound 
E_{Ri , i?2 ) is positive so Pe""^ decays exponentially fast. Eq. dSSl) shows that the second derivative of the exponent 
with respect to the tilting parameter p evaluated at /) = is precisely the (conditional) dispersion. This relation has 
found several applications in moderate deviations analysis for information-theoretic problems ||29|| - II3T| . 
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Deviation from SW boundary rj Deviation from SW boundary r] 

Fig. 7. Comparisons of the required blocklengths estimated by dispersion and error exponent analyses. The vertical axis is logjQ(n), where 
n is the blocklength. See further descriptions in ( 1591 ) and i60\ . 



As is also well-known, the types-based characterization by Csiszar and Korner |[34l (see also |[T4l Probs. 3.1.5-6]) 
coincides with the Gallager-style exponents in ([53l)-(l56l). i.e., the exponents are the same. See |fT4l Prob. 1.2.13]. 
Even though Csiszar-Korner-style exponents are more intuitive, they are less amenable to numerical evaluation 
because they involve optimizing over joint distributions on Xi x X2 rather than a single scalar tilting parameter 
p G [0, 1]. Thus, we choose to compare our bounds to Gallager-style error exponents. Upper bounds on the error 
exponent for SW coding are provided in |[T4l Prob. 3.1.7]. It is also known that they match the lower bound 
E_{Ri,R2) in (l56l ) when (/2i,i?2) is within but close to the boundary of the achievable region lll4l Prob. 3.1.7]. 
Thus for low rates, (l56l) is in fact an equality. This is likened to channel coding where the sphere -packing and 
random coding exponents coincide above the critical rate [14, Sec. 2.5]. This does not come as a surprise because 
of the duality between Slepian-Wolf coding and channel coding |[T4l Sec. 3.2]. 

We study estimates of the required blocklength n in the following way: We parametrize (i?i,i?2) as follows: 
Ri{r]) := (1 + r])H{Xi\X2), R2{'>]) '■= (1 + ^)-f^(-'^2|-''^i) for some positive number rj. We ensured that 77 is 
sufficiently large so that Ri{r]) + R2{ri) > H{Xi,X2) and hence the rate pair {Ri{r]), R2{r])) £ ^sw- ^^^^ this 
parametrization, as the deviation from the SW boundary rj increases, the rate pair moves further into the interior 
of the SW region in We can then use two methods to estimate the critical n required to achieve a target error 
probability e. The first is dispersion analysis — we solve for the n in the bounds we derived in Theorem [T] i.e., 
the least integer riD satisfying 

P[Z < V^(R-H)] > 1-e, (59) 

where Z ~ AA(0, V) and V is the dispersion of pxi,X2- The second method is error exponent analysis — for the 
rate pair R2), we solve for the lower bound to the error exponent -R2) in (l56l ) and invert the relationship 

in (I53] ) to obtain the estimate 

Plots of n-Q and riE as functions of r/ and e are shown in Fig. |7] for the source in (l50l ). Firstly, we observe that n 
decreases as r/ increases, which agrees with our intuition that smaller blocklengths are required if the compression 
rates are large. Both dispersion and error exponent analysis exhibit the same trend. Secondly, we observe that 
dispersion analysis generally predicts a smaller blocklength compared to Gallager-style error exponent analysis. As 
an example, when i] = 0.1 and e = 10"'^, no ~ 9.9 x 10^ while ns ~ 1-6 x 10^. Thus, the required blocklength 
estimated by dispersion analysis is about 39% less than error exponent analysis for this setting. Dispersion analysis 
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Fig. 8. Plots of the MAC boundary in l llSt and the (n, e)-inner bound in i25\ for different blocklengths and error probabilities. Here 
pxi = Bern(O.l) and px2 ~ Bern(0.9) are fixed input distributions. The time-sharing variable Q — II). The legends apply to all plots. 

is, in a sense, more delicate (or finer) than that of error exponents ifTSl . lfT6l . |[34l . The difference, however, is less 
pronounced as the rates increase. In other words, the estimates of n become closer (on a linear not logarithmic 
scale) as the rates are further into the SW region. 



B. Multiple-Access Channel 

In this section, we illustrate the differences between the asymptotic MAC region and the finite blocklength MAC 
region '^mac('^' ^^^y graph the inner bound in Definition [TOl as we do not yet have an outer bound. 

We fix a channel with binary inputs and binary outputs: 



W{ : |xi,X2) 



[h h] if xi®X2 = 0, 
[b b] if xi®X2 = 1, 



(61) 



where b = 0.1 and h := 1 — h. We also fix input distributions pxi = Bern(6) and px^ = Bern(5). The time-sharing 
variable Q = $. This channel and input distributions result in a positive-definite information dispersion matrix V 
defined in (l24b . These settings also ensure that we have symmetry in the sense that I{Xi]Y) = I{X2;Y) and 
I{Xi; Y\X2) = I{X2; Y\X2). In Fig. [H we graph the MAC region in (fTSl) which is well known to be a pentagon. 
We also plot the inner bound in (l25l) for the chosen px^ , PX2 for different values of the blocklength n and error 
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probability e. Note again that the 0(-^^) term is neglected because we are primarily interested in this paper 
with first- and second-order coding rates. In addition, we plot the (n, e)-SI-ED boundary. This is the analogue 
of ( |49l ) for the SW case. More precisely, the vertical (resp. horizontal) boundary represents the (n, e)-capacity of 
a DMC when X2 (resp. Xi) is available as side information at the decoder together with channel output Y. The 
sloping (dotted-red) line Ri + R2 = I{Xi,X2;Y) — \/WhJ/^ i^) represents the (n,e)-sum rate constraint 
of the DM-MAC. Operationally, it represents the (n, e)-capacity of the "cooperative MAC" but with independent 
messages. These are three crude outer bounds that turn out to be insufficient for the DM-MAC. 

We see that as in the SW case, the inner bound to the (n, e)-capacity region is, as expected, strictly contained 
in the asymptotic MAC region. Furthermore, it has a curved boundary in general and the larger the blocklength n, 
the closer the inner bound is to the asymptotic region. The different behaviours of the (n, e) -region along the equal 
rate line and the line that passes through a corner point [cf. (ISTl lI are qualitatively similar to the SW case described 
in Section IVI-AI so we will omit the corresponding plots. We do note that, in analogy to Proposition |4] for the SW 
case, at any rate Ri < I{X; Yi), the difference between the two different i?2's (asymptotic and finite blocklength) 
is essentially y'[V]2,2/n Q~^(e) where [V]2,2 = \/ar{log[W{Y\Xi, X2)/pY\Xi{y\Xi)]) is the conditional channel 
dispersion. Finally, the corner points exhibit more complex behaviours; the effective dispersion depends on the 
angle of approach and the correlation between the marginal and joint information densities. 



VII. Discussion and Open Problems 

To summarize, we characterized the (n, e)-optimal rate region for the SW problem. We also provided corre- 
sponding inner bounds for the DM-MAC and DM-ABC problems. We unified our achievability proofs through an 
important theorem known as the vector rate redundancy theorem. We believe this general result would be useful 
in other network information theory problems. 

Clearly, it would be desirable to derive outer bounds for the (n, e)-capacity region of the DM-MAC and DM-ABC. 
We have discussed the difficulties to obtaining tight outer bounds. To derive tight outer bounds for the DM-MAC, it 
appears that generalizations of Polyanskiy et al.'s meta (or minimax) strong converse fOl Theorem 26] or Augustin's 
strong converse |[50ll to multi-terminal settings are required. For the MAC, it was mentioned in Section IIII-BI that 
a sharpening of Ahlswede's wringing technique [49] seems necessary for a converse proof. 

It is also of interest to extend these finite blocklength results to channel and lossy source coding problem with 
side information. These include the Gel'fand-Pinsker problem (channel coding with non-causal state information 
at the encoder) and the Wyner-Ziv problem (lossy source coding with side information at the decoder). Again, the 
authors believe that versatile strong converses, such as that in |55 | and |56], have to be developed and strengthened 
for meaningful finite blocklength results. Preliminary work for channels with random state which is known at the 
receiver was presented by Ingber and Feder |25]. 

Finally, for the relay channel, the most well-known achievability schemes are decode-forward and compress- 
forward. These coding procedures rely on block-Markov coding [l]. Essentially, one codes over b (correlated) 
blocks each of length n, achieving rate of approximately ^^R for some rate R. Given a fixed super-blocklength 
N, how can we resolve the tradeoff between the number of blocks b and the sub-blocklength n to maximize the 
overall rate subject to an error probability of e? 



VIII. Proofs of Main Results 

In this section, we provide the proofs for the results in the previous sections. We start in Section I VIII- A I by 
stating and proving a preliminary but important result known as the vector rate redundancy theorem. This result is 
a generalization of the (scalar) rate redundancy theorem in 1*221, ll23l . We then prove finite blocklength results for 
the SW problem, the MAC, and the ABC in Sections IVlirBl IVIII-CI and IVIII-DI respectivelv. 



A. A Preliminary Result 

Theorem 5 (Vector Rate Redundancy Theorem). Let g : 3^{X) 

dgtipx) 



dpx{x) 



be twice continuously dijferentiable. Let 

(62) 



Px {x) 
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for t = 1, . . . ,d be the component-wise derivatives of g. Denote the vector of derivatives (the gradient vector) as 
g'(x) = [g'i{x), . . . ,g[{x)]'^. Let V G R'^^'^ be the covariance matrix of the random vector g'(X), i.e., 

V = Covx[g'(X)] = E[(g'(X) - E[g'(X)])(g'(X) - E[g'(X)])^]. (63) 

Assume that rank(V) > 1 and ^ := E[||g'(X) — E[g'(X)]||2] < oo. Furthermore, let = {Xi, . . . be an 

i.i.d. random vector with X^ ~ px{x). Let the sequence {bn}'^=i satisfy 

ologn 

On > (64) 

n 

for any a > 0. Then, for any vector z G W^, we have 

P UiPx-) > g{px) + ^ - bnl] > P(Z > z) + O 0-^^ , (65) 



where Px^ G 3^n{^) is the type of the sequence X" and Z ~ AA(0, V). 

Before we prove Theorem [51 let us state Bentkus' version of the multidimensional Berry -Esseen theorem. 

Theorem 6 (Bentkus pl71). Let Ui, . . . , Un be normalized i.i.d. random vectors in M.'^ with zero mean and identity 
covariance matrix, i.e., E[Ui] = and Cov[Ui] = L Let S„ := -^(Ui + . . . + Un) and ^ = E[||Ui||2]. Let 
Z ~ A^(0, 1) be a standard Gaussian random vector in W^. Then, for all n G N, 

sup |P(S„ G ^) - P(Z G 'T)! < (66) 



where (£d is the family of all convex, Borel measurable subsets of M'^. 

Bentkus remarks in ||T7| that the constant 400 in Theorem [6] can be "considerably improved especially for large 
d". For simplicity, we will simply use (l66l ). Because we will frequently encounter random vectors with non-identity 
covariance matrices and "whitening" is not applicable, it is necessary to modify Theorem [6] as follows: 

Corollary 7. Assume the same setup as in Theorem^with the exception that Cov[Ui] = V ^ and Z ~ M{0, V). 
Then (1661 ) becomes 

sup |P(Sn G 'T) - P(Z G ^)| < , ^""I'slV - (6'^) 

The proof of the corollary is by simple linear algebra and is presented in Appendix |B] We are now ready to 
prove the important vector rate redundancy theorem. 

Proof: First we assume that Amin(V) > 0. In the latter part of the proof, we relax this assumption. By Taylor's 
theorem applied component-wise, we can rewrite g{Px^) as 

s{Px^) = g{px) + g'{x)[PxAx)-px{x)] + A. (68) 

Recall that g is twice continuously differentiable and the probability simplex ^{X) is compact. As such, we can 
conclude that each entry of the second-order residual term in ( [68] ) can be bounded above as 

\At\ < MPx'^ - PxWl (69) 

where /3t > is some function of the second-order partial derivatives (Hessian) of gt with respect to the vector 
[Px(0),pjs:(l),--- ,Px{\'^\ - l)]"^- Setting /3 := maxt=i^...^rf/3t gives 

||A|U</3||Px"-px||i. (70) 
We now evaluate the probability that ||A||oo exceeds c„ > 0: 

P(||A|U > Cn) < P(/3||PX" -PXWI > Cn) (71) 

<Pi\\Px--px\\l>Cn//3) (72) 
< 2\^\2~^c„/m^ (73) 
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where (TtTI ) uses the bound on ||A||oo in dTOl ). ( 1721 ) follows because the ^2-iiorm dominates the £i-norm for finite- 
dimensional vectors, and finally (1731 ) follows from a sharpened bound on the ^i-deviation of the type from the 
generating distribution by Weissman et al. [i57il . Setting c„ := 2/3(|Af| + l)/n establishes that 

P(||A|U > Cn) < -. (74) 

n 



For convenience, let us denote the left-hand-side (LHS) of (1651 ) as Then, using 

Qn = P[Yl s'{x)[Px'^{x)-px{x)] + A > ^ - 6a J . (75) 
Now, we note the following fact which is proved in Appendix O 

Lemma 8. Let G and A he random vectors in W^. Let w he a vector in W^. Then for any </> > 0, 

P(G + A > v) > P(G > V + ,^1) - P(|| A||oo > (76) 

Using the identifications G ^ (x)\Px-^ix) — pxl^;)], ^ c„, A ^ A and v ^ 1.1^/71 — bnl, we can 

lower bound the right hand side of dTSl ) as follows. 



gn> P ( Vg'(x)[Px"(x)-px(x)] >^-bnl + Cnl] - P(||A|U > C„) (77) 
> ply2g'{x)[PxAx)-px(.x)] >^-bnl + Cnl] " (78) 

In the last inequality, we used the result in (1741 ) for the chosen c„. Because the type puts a probability mass 
of on each sample X^, 

n 

J]g'(x)Px"(x) = -j;g'(Xfc). (79) 

By definition of the expectation, we also have 

^g'{x)px{x) = E[g'{X)]. (80) 



The substitution of ^ and ^ in ([78) yields 



qn>P ( - V(g'(Xfc) - E[g'(X)]) > ^ - 6„1 + c„l I - - (81) 

= P ( ^ V(g'(Xfc) - E[g'(X)]) > z - - c„)l I - -. (82) 

Now note that the random vectors {g'(Xfc) — E[g'(X)]}^^^ are i.i.d. and have zero-mean and covariance V defined 
in (l63l ). In addition, the set {g G M'^ : g > z'} is convex so it belongs to C^. Using the multidimensional 
Berry-Esseen theorem in (|67] ) to further lower bound (l82l ) yields 

400(ii/^£ 1 

,„>P(Z>z-Vi(!.,.-.„)l)-^;-^-- (83) 

where the third moment £ = E[||g'(X) — E[g'(X)] H^] < oo by assumption. In addition, we assumed that Amm(V) > 
so the second term is finite. Now, note that the sequence \/n{hn — Cn) = since 6„ can be taken to be 

G(^2£2i) from (l64l) and Cn = 0(^)- Since 5 i— P(Z > z — 51) is continuously differentiable and monotonically 
increasing, we have 

P(Z > z - 51) = P(Z > z) + 0{5) (84) 
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by Taylor's approximation theorem. Applying ( [84b to ( [83] ) with 6 = \/n{bn — c„) yields the lower bound 

,„>p,Z>., + 0(-|.)-^;-^--. (85, 

whence the desired result follows for the case V ^ 0. 

Now we consider the case where V is singular but recall that we assume rank(V) > 1. The only step in which 
we have to modify in the proof for the case V ^ is in the application of the multidimensional Berry-Esseen 
theorem in ([83l ). This is because we would be dividing by Amin(V) = 0. To fix this, we reduce the problem 
to the non-singular case. Concretely, assume that rank(V) = r < d and define the zero-mean i.i.d. random 
vectors := g'{Xk) — E[g'(X)]. Then there exists a d x r matrix T such that A^ = TB^ where are i.i.d. 
random vectors in W with positive-definite covariance matrix. The matrix T can be taken to be composed of the 
r eigenvectors corresponding to the non-zero eigenvalues of V. We can now replace the A^ vectors in ([82] ) with 
TBfc and apply the multidimensional Berry-Esseen theorem |T7l for the vectors {B^}^^^ C M*". The theorem 
clearly applies since the set {b € W : Tb > z'} is convex. This gives the same conclusion as in ([85] ). ■ 

B. Proofs for the Slepian-Wolf Problem 

We now present the proof of Theorem [T] on the (n, e) -optimal rate region for distributed lossless source coding. 
We present the achievability proof in Section I VIII-B 1 1 and the converse proof in Section IVIII-B2I We will see 
that the achievability procedure (coding scheme) is universal. In Section I VIII-B 31 we discuss the implications of 
choosing not to use a universal decoding rule but a rule that is akin to maximum-likelihood decoding |[T5l . 

I) Achievability: 

Proof: Let i?2) be a rate pair in the inner bound ^in(^^, e) defined in ([TT]) . 
Codebook Generation: For j = 1,2, randomly and independently assign an index G [2"-^^^] to each 

sequence re" G Af" according to a uniform probability mass function. The sequences of the same index form a bin, 
i.e., Bj{mj) := {x" G A"" : fi^n{x'j) = rrij}. Note that Bj{mj),mj G [2"^^] are random subsets of Xj-. The bin 
assignments are revealed to all parties. In particular, the decoder knows the bin rates Rj. 

Encoding: Given G Xj", encoder j transmits the bin index /j Hence, for length-n sequence, the rates of 
mi and m2 are Ri and R2 respectively. 

Decoding: The decoder, upon receipt of the bin indices (mi, 7712) finds the unique sequence pair (x",X2) G 
Bi{mi) X ^2(^1^2) such that the empirical entropy vector 

"F(x^|x5)" 
H(x^x5):= Hix^\x^) 

where the thresholding sequence (5„ is defined as 



<n-dnl, (86) 



5.:=(\Xi\\X2\ + iy-^^^^. (87) 

Define =3^(R, 5n) := {z G : z < R — to be the typical empirical entropy set. Then, ([86] ) is equivalent to 
H(x",X2) G £^{Il,6n)- If there is more than one pair or no such pair in Bi{mi) x B2{m2), declare a decoding 
error. Note that our decoding scheme is universal |[T4l . i.e., the decoder does not depend on knowledge of the 
true distribution pxx.x^- It does depend on the rate pair which is known to the decoder since the codebook (bin 
assignments) is known to all parties. 

Analysis of error probability: Let the sequences sent by the two users be (X", X2) and let their corresponding bin 
indices be (Mi, M2). We bound the probability of error averaged over the random code construction. Clearly, the 
ensemble probability of error is bounded above by the sum of the probabilities of the following four events: 



£1 
£2 
£3 
£4 



{UiX^,X^)^^{K,6n)} (88) 

{3x^ G Si (Ml) \ {Xri : H(5?,^2") ^ =^(R,^n)} (89) 

{3x5 e B2{M2) \ {Xn : H(Xr,x5) G =^(R,<5„)} (90) 
{3x^ G Si(Mi) \ {X^},x^ G B2{M2) \ {X^} : 

H(x?,x5) G =^(R,(5„)} (91) 
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We bound the probabilities of these events in turn. Consider 

P{£i) = 1 - P(H(Pxr,xj) G =^(R,<5„)) (92) 
= 1 - P(H(Pxr,x?) < R - 6nl) (93) 

= 1 - P (^H(Px^x?) < H(px„xJ + + (a„ - (94) 

where we made the dependence of the empirical entropy vector on the type explicit in (l92l) . In (l93l) . we invoked 
the definition of =y^(R, In (|94l ). we used the fact that R = H(pxi,Jf2) + + f^J" some vector z G that 
satisfies P(Z < z) > 1 - e where Z ~ AA(0, V) and a„ = for = \Xi\\X2\ + 1. 

We now bound the probability in ( |94l ) using the vector rate redundancy theorem with the following identifications: 
random variable X ^ {Xi,X2), smooth function g{pxi,X2) ^ ~'^{PXi,X2)' evaluation vector z ^ — z and 
sequence 6„ ^ an — 6n- Note that the coefficient of a„ here just has to be larger than |A'i||Af2| + 1/2 for the 
sequence a„ — (5„ = 0(^^^) to be positive, satisfying (l64b . This has been ensured with the choice of u in (fTTT) . 
Also, the third moment is uniformly bound as stated in (11871 ) in Appendix |D] . 

With the above identifications and the realization that the matrix V in the vector rate redundancy theorem equals 
Cov(h(Xi, X2)) (by direct differentiation of entropy functional), 

P{£f) > P(Z > -z) + O f]^^] (95) 



P(Z < z) + O ( i^^l (96) 



> 1 - e + O ( ^ ) , (97) 



where in ( [96l ) we used the fact that P(Z > — z) = P(Z < z) because Z has zero mean. Consequently, 

PiE,)<e-0('-^]. (98) 



For the second event, by symmetry and uniformity, P(<?2) = P{£2\^i G 'Si(l))- For ease of notation, let p := 
Px^,x^- Now consider the chain of inequalities: 

P{£2\X^ e Bi{l)) 

= J]p(x?,x^)p[3x?G^i(l)\{Xr}: 



(99) 



<J]p(x^x^) J] P(x^gSi(1)) (100) 



1 5-^2 



<J]p(x^x^) J] P{x^eBi{l)) (101) 



1 '-^2 



^"'^2 x'lj^x'l:H{x'l\x^)<Ri-5„ 



< E E ^^(^2) E E 2-"^^ (103) 

H{V\P^,^)<R^-S„ 

< E E ^'(^s) E 2"^(^l^^?)2-"^^ (104) 

Qei?'„(A'2)a;jerQ Ver„(A'i;Q): 

^^(V'|P.j)<i?i-5„ 

< E^'(^2)(^ + l)l^ill'^^l2"(^i-'5")2-"^i (105) 
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where ( [99b follows from the definition of 82, dlOOb follows from the union bound and because for 7^ x", the 
events {x^ G -61(1)}, {x^ G -61(1)} and {{X'l.X^) = are mutually independent, and (fTOTT) follows 

from the inclusion {x^ : H(x^,x^) e =3^(R,5„)} C {x^^ : H{x1\x'^) < Ri - 6n}. Equality (fT02l ) follows from the 
uniformity in the random binning. In (1103b . we first dropped the constraint x" 7^ x" and marginalized over x". 
Then, we partitioned the sum over X2 into disjoint type classes indexed by Q G ^n{,^2) and we partitioned the 
sum over x\ G Af" into sums over stochastic matrices V G Xi(<^i; Q) (for notation see Section IPP] ). In ( 11041 ). we 
upper bounded the cardinality of the F-shell as \Tv{x^)\ < 2"^(^l^-5) [14, Lem. 1.2.5]. In (fTOSl ). we used the 
Type Counting Lemma Iil4i Eq. (2.5.1)]. By the choice of 5„ in (|87] ). inequality (11051) reduces to 

P(f2) < -1^. (106) 

Similarly P(£:3) < (n + l)-!/^ and P(£:4) < (n + ly^l'^. 

Together with ( |98] ). we conclude that the error probability defined in ^ averaged over the random binning is 
upper bounded as 



P(£:)<^P(f,)<e, 



(107) 



for all n sufficiently large. Hence, there is a deterministic code whose error probability in ^ is no greater than e 
if the rate pair {Ri,R2) belongs to I^[a{n,e). ■ 
2) Converse: 

Proof: To prove the outer bound, we use Lemma 7.2.2. in Han B3l which asserts that every (n, 2"^^ , 2^^^ , e)- 
SW code must satisfy 



e > P 



n 



logpx^\x-{X?\X^)>Ri+7 



or - logpx,"|xr(^2 l^r) > + 7 



n 



or - logpxr,x? (Xr, ^2 ) > ^1 + ^2 + 7 
n 



3(2" 



-h(Xi",X2")<R + 7l 
n 



3(2-"^), 



(108) 



(109) 



for any 7 > 0. This result is typically used for proving strong converses for general (non-stationary, non-ergodic) 
sources but as we will see it is also very useful for proving a converse in the finite blocklength setting. Recall that 
h.{Xf,X2) is the entropy density vector in (|7]) evaluated at (Xf,X2). By the memorylessness of the source, it 
can be written as a sum of i.i.d. random vectors {h.{Xik, X2k)}k=i- 

We assume that V ^ 0. The case where V is singular can be handled in exactly the same way as we did 
in the proof of the vector rate redundancy theorem. See discussion after (1851 ). Fix 7 := and define z := 
^/n{'R — H + ^^^1). Now consider the probability in (11091 ). denoted as s„: 



i ^ hiX,,,X2,) <H+A_i^i + ^i 

k=i ^ 
1 " 

^ V(h(Xifc,X2fc)-H) <z 



logn. 



(110) 



(111) 



We are now ready to use the multidimensional Berry-Esseen theorem. We can easily verify that the third moment 
^sw = E[||h(Xi,X2) — H(pxi,X2)|l2] is uniformly bounded. See (11871) in Appendix IdI As such, using (|67] ) we 
can upper bound Sn as follows: 



Sn<P 



Z < z 



logn_ 



+ 



400(3i/^)gsw 

Amin(V)3/2V^ 



P(Z < z) -O 



logn 



(112) 
(113) 
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The last step follows by Taylor's approximation theorem. See ( [84l ). On account of ( 11091) and dl 131) . 

,,,_P,z,.,.o(J^)-A 

which, upon rearrangement, means that z G ^(V,e — 0(i^^)). Since o^(V, e') C ^(V, e) if e' < e, the vector 
z G o5^(V, e). This implies that {Ri,R2) G ^outl'^ifi) from the definition of z. ■ 
Comments on the proof and Universal Decoding: In place of the universal decoding rule in (l86l) . one could 
use a non-universal one by comparing the normalized entropy density vector (instead of the empirical entropy 
vector) evaluated at {x^,X2) with the rate vector, i.e.. 



n 



< R - 



(115) 



.log pxr,x?V'^i,-^2, 

In this case, Taylor expansion as in the proof of the vector rate redundancy theorem [cf. ( [68] )1 would not be required 
because the above criterion can be written a normalized sum of i.i.d. random vectors. The multidimensional Berry- 
Esseen theorem can thus be appUed directly. Under the decoding strategy in dl 15K close examination of the proofs 
shows that there is symmetry between the error probabiUty bounds in the direct and converse parts as in B3l 
Lemmas 7.2.1-2]. In f36l, the authors also suggested a universal strategy for finite blocklength SW coding. They 
suggested the use of feedback to estimate the source statistics, whereas we use the empirical entropy here, cf. 



C. Proofs for the MAC 

We now present the proof of Theorem |2] on the (n, e)-capacity region for the DM-MAC. We present the proof of 
the inner bound in Section [VIII-C 1 1 and the proof that the cardinality of Q can be restricted to 9 in Section [VIII-C2I 
In Section IVIII-C3I we comment on how the proof and the statement of the result can be modified if the input and 
output alphabets of the MAC are not discrete but are arbitrary. 

1) Achiev ability: 

Proof: Fix a finite alphabet Q and a tuple of input distributions {pq,PXi\q^PX2\q)- Fi'^ ^ P^ii" of ("-i^)" 
achievable rates (i?i,i?2) S ^{'n,€;pQ,px-,\QTPx2\Q)- definitions in Section UlI-AI 

Codebook Generation: Randomly generate a sequence ~ 11^=1 PQiQk)- For j = li 2, randomly and conditionally 
independently generate codewords x^{mj) ~ Y\k=iPXj\Q{xjk\Qk) where mj G [2"^^]. The codebook consisting of 
g", x'^{mi),mi G p"^!], and X2{m2),m2 G [2"^^] is revealed to all parties. 
Encoding: For j = 1, 2, given mj G [2"^^], encoder j sends codeword x^{mj) G Xj'. 

Decoding: The decoder, upon receipt of the output of the DM-MAC G 3^" finds the unique message pair 
(mi,m2) G [2"^i] x [2"^^] such that the empirical mutual information vector 

I{x^{mi) A ^"1x2(^-2)5 q"") 
I{x2{rh2) A y"|x"(mi), g") 
_/(x"(mi),X2(rn2) A y^lq"^) 

where 5„ := (|Q||A'i||A'2||3^| + ^) ^°^^"^'^^ ■ If there is no such message pair or there is not a unique message pair, 
declare a decoding error. We remind the reader that /(x"(mi) A?/"|x2 ("12), g") is the conditional mutual information 
I{Xi; Y\X2,Q) where the dummy random variable {Q, Xi,X2,Y) has distribution, an n-type, -Pg",x5'(mi),a;j(rn2),y"- 
Let ^(R^Sn) := {z G M'^ : z > R + 5nl} be the typical empirical mutual information set. Then the criterion 
in (1116b is can be written compactly as x"(mi), X2 ("^2), y") G ^(R,^n)- Note that, unlike typicality set 
decoding |1] or maximum-likelihood decoding ||33l . the decoding rule in dl 161 ) is universal, i.e., the decoder does 
not need to be given knowledge of the channel statistics W. 

Analysis of error probability: By the uniformity of the messages Mi and M2 and the random code construction, 
we can assume that (Afi,M2) = (1, 1). The average ensemble error probability is upper bounded by the sum of 



iiq-,xU^i),x^im2),f 



>R + <5„1, (116) 
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the probabiUties of the following four events: 

:= {i(Q",Xr(l),X2"(l),y") i ^(R,5„)} (117) 

^2 := {3mi / 1 : \{Qr ,X^{m^),X^{\),Y^) G jr(R,5„)} (118) 

^3 := {3m2 / 1 : i(Q", ^^(l), X2"(m2), I'") G =;7(R,5n)} (119) 

^4 := {3mi / l,m2 / 1 : i(Q", X^mi), X2"(7fi2), l"") G =^(R,'5n)} (120) 
We use the definition of =^(n, e;pQ,pxi|Q;PJs:2|Q) i^i ( l25] ) to express P(<?i) as follows: 

P(^i) = 1 - P (i(Q",Xr(l),X2"(l),y") G =^(R,5„)) (121) 

= l-P(i(Q",Xf(l),X2"(l),y") >R + 5nl) (122) 



= 1 - P (\(ff\X1(\),X^(\)X') > 1{PQ,PX,\Q,PX2\Q,W) + ^-anl + <5nl) , (123) 

where (1122b follows from the definition of ^(R, In (1123b . we used the definition of o5^(V, e) to assert that 
z G is a vector satisfying P(Z > z) > 1 — e for Z ~ A/'(0,V). Also, the sequence a„ = where 

i^ = |Q||^ill^2||3^l + i- 

Now we use the vector rate redundancy theorem with the following identifications: random variable X ^ 
{Q,Xi,X2,Y), smooth function giPQPx.lQPx^lQ^) ^ ^PQ,Px,\q,Px2\Q^W), evaluation vector z ^ z and 
sequence 6„ ^ an—^n- If the coefficient of a„ is larger than that of (5„, say v = \ Q,\\Xi\\X2\\y\ + l as in ( [25l ). an—6n 
is a positive sequence of order 0(^^^), satisfying (l64l) . Also, the third moment ^mac ■= E[||i((5, Xi, X2, 1") — 
I(PQ)PXi|Q5Px2|Q' ^)ll2] uniformly bounded by (|188b in Appendix iDl As such, the probability in ( 11231 ) satisfies 

0„1 + (^r,l 



Pi^l{Q\xni),XUl),Yn>I{PQ,Px,\Q,PX2\Q,W) + -^ 



> P(Z > z) + O ( ) (124) 



/ log n \ 

where in the first inequality, we used the fact that the V in the vector rate redundancy theorem coincides with 
the information dispersion matrix V(pQ,pxi|Q)PX2|Q) This can easily be verified by direct differentiation of 
(conditional) mutual information quantities with respect to the joint distribution pq^Xi,X2,y '■= PqPXi\qPX2\Q^ ■ 
Combining (1123b and (1125b yields 

?(£,)< e-0(^^y (126) 

To bound the probabilities of £2, £3 and £4, we use the following lemma whose proof is relegated to Appendix El 
This result is a types-based analogue of the (conditional) joint typicality lemma used extensively for channel coding 
problems in 11]. 

Lemma 9 (Atypicality of Empirical Mutual Information). Fix a joint distribution pu,x,Y = PUPx\uPy\u> ^■^■> 
X - U - Y form a Markov chain in that order Let (t/^jX",!^") ~ ]Xk=iPu,x,Y{uk,Xk,yk) so X'^ - U"- - Y^. 
Then for any t > and any n G N, the empirical mutual information /(X" A Y^\U^) satisfies 

P(/(X" A y"|C/") > t) < (n + 1)I'^II3^IIWI2-"*. (127) 

Now we use this lemma to bound P(<f2)- By the union bound and the symmetry in the generation of the codewords, 

P(^2) < Yl P(i(Q",^r(^2),X2"(l),^") G =^(R,5n)) (128) 

= ([2"^^] - l)P(i(Q",Xr(2),X2"(l),y") G ^{R,5^)) (129) 

< 2"^i P(i(Xr(2) A y"|X?(l), Q") >Ri + 5n) (130) 

< 2"^iP(/(Xi"(2) A (X2"(l),y")|Q") > Ri + dn) (131) 
<{n + i)ISII'^ill'^2||y|2"^?i2-"{«i+5«) (132) 
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where (fT30l ) follows from the inclusion {i(Q", (2), X^(l), Y") G ,^(R,5n)} C {/(Xf (2) A Q") > 
i?i + 6n} and [t] - 1 < t, dEB follows from the fact that I{Xi;Y\X2,Q) < I{Xi; X2,Y\Q) for any four 
random variables Q, Xi, X2,Y. For ( 11321 ). we applied the atypicality of empirical mutual information lemma with 
the following identifications: t ^ i?i + C/ ^ Q, X ^ Xi and Y {X2,Y). Note that for rhi / 1, X"(m2) 
is conditionally independent of (1),^") given Q" so the lemma applies. Using the definition of (5„, we have 

Pi£2) < -j^. (133) 

Similarly, PiS^) < (n+l)~^/^ and P(£:3) < (n+l)"^/^. Uniting (11261 ) and (11331 ) reveals that the average probabiUty 
of error of the random code ensemble is bounded above as P{£) < J2i=i ^i^i) — ^- Therefore, there must exist a 
code whose average probability of error for the DM-MAC W is bounded above by e as desired. ■ 

2) Cardinality Bounds: 

Proof: We now argue that |Q| can be restricted to be no greater than 9. The following 9 functionals are 
continuous in PXi,X2\Q PXi\qPX2\q'- Three mutual information quantities I{Xi;Y\X2,Q), I{X2;Y\Xi,Q) and 
I{Xi, X2;Y\Q), three variances on the diagonals of ^(pQ,PXi\Q:PX2\Q7^) ^^id three covariances in the strict 
upper triangular part of ^{pq,Pxi\q^Px2\Q^ support lemma |[l4l Lemma 3.4] (or Eggleston's theorem), 

there exists a discrete random variable Q' , whose support has cardinality |Q'| < 9, that preserves these 9 continuous 
functionals in PXi.XalQ- Thus, the inner bound is preserved if the auxiliary time-shaiing random variable Q is 
restricted to have cardinality 9. ■ 

3) Extension to Arbitrary Alphabets: In place of the universal decoding rule in (II 161 ). one could use a non- 
universal one by comparing the normalized information density vector (instead of the empirical mutual information 
vector) with the rate vector, i.e., 

i(x]^(rni); y"|x2 (?fi2), g") 
i(x2 (^2); (w-i), 4"") 
_i (x" (mi ) , X2 (m2 ) ; y" I Q'"' ) 

where z(x"(mi); y"'|x2 (m2), g") := log[l^"(y"|x"(mi), (m2))/pyTi|X2",Q"(y"'k2 ("^2), ^")] and similarly for 
the other two information densities. For this non-universal decoding strategy, Taylor expansion as in the proof of 
the vector rate redundancy theorem [cf. (|68] )1 would not be required because the above criterion can be written as 
a normalized sum of i.i.d. random vectors. One can verify that a simpler version of the vector rate redundancy 
theorem can be proved for the decoding rule in (11341 ) if the channel and input distributions are such that the third 
moment is bounded. In addition, we need to generalize the atypicality of empirical mutual information lemma 
for the steps in ( |128l )-( fT33l ) to hold. This can be done using standard Chernoff bounding techniques. Indeed, if 
X - U -Y form a Markov chain and (C/",X",y") ~ Y\l=iPu,x,Y{uk,Xk,yk), then 



1 

n 



>R + 5nl, (134) 



for every t > 0. This is the analogue of Lemma|9l Finally, note that we have used i.i.d. codebooks for simplicity. For 
the AWGN-MAC, a codebook containing codewords of exact power may result in a smaller dispersion. See ifTTI . 
|fT2l| for the single-user case. 



D. Proofs for the Asymmetric Broadcast Channel 

We now present the proof of Theorem|3]on the (n, e)-capacity region for the DM-ABC. Conceptually, it is simple 
— it uses the superposition coding technique [5] and the vector rate redundancy theorem. 
1) Achiev ability: 

Proof: Fix an input alphabet U and also an input distribution pu,x G !3^{UxX). This input distribution induces 
the distributions pu and Px\u- '^Iso fix a pair of achievable rates (i2i,i?2) belonging to the region ^{n,e;pu,x) 
(Definition [Bl ). 

Codebook Generation Randomly and independently generate 2"^^ cloud centers ^"(7712) ~ Y[k=i Pu {"^k) , ''tT'2 £ 
[2"^^]. For every 7712, randomly and conditionally independently generate 2"^^ satellite codewords x" (mi, 7712) ~ 
]Xk=iPx\uixk\uk{fn2)) , iTT-i G [2"^i]. The codebooks consisting of the and x" codewords are revealed to the 
encoder and the two decoders. 
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Encoding: Given (mi, 771,2) G [2"-^^] x [2"^^], the encoder transmits (mi, 7712). 

Decoding: Decoder 2 only has to decode the common message m2. When it receives 1/2 £ 3^2 > finds the unique 

7712 e [2"^2] such that 

/(77"(7n2) AT/^)) >i?2 + (5„, (136) 

where the sequence (5„ := (|W||Af| max{|3^i|, |y2|} + ^) '°^^""'"^^ - If there is no such message or there is not a unique 
one, declare a decoding error. Decoder 1 has to decode both the common message m2 and its own message mi. 
When it receives G y^, it finds the unique pair (7711,7772) G [2"^^] x [2"^^] such that 



J(7i"(7772),x"(7ni, 7772), 2/1) : = 



/(x" (7771, 7772) A lit" (7772)) 
/(x" (7771, 7772) A y") 



> 


Ri 




Rl + i?2 



(137) 



If there is no such message pair or there is not a unique one, again declare a decoding error. For convenience in 
stating the error events, we use the notation ^{Ri,R2,Sn) := {z G : 2:1 > i?i + 6n,Z2 > Ri + R2 + Sn}- 
We remind the reader that the notation /(x" (7771, 7772) A |7t"(7772)) denotes the conditional mutual information 
I{X;Y\U) where {U,X,Y) is a dummy random variable with distribution, an 77-type, fun(}fi2),a"(mi,m2),{/"- 
Analysis of Error Probability: By symmetry and the random codebook generation, we can assume that (Mi, M2) = 
(1, 1). The error event at decoder 2, namely £2 := {M2 / M2}, can be decomposed into the following 2 events: 

£2,1 ■■= {/(^7"(1) A ^2") < ^2 + Sn} (138) 
^2,2 := {3 7f72 / 1 : /(f/"(m2) A Y^) > R2 + <5„} (139) 

Decoder I's error event, namely £1 := {Mi 7^ Mi}U{M2 / M2}, can be decomposed into the following 3 events: 



£1,1 
£1,2 

^1,3 



{J(?7"(l),X«(l,l),yn ^ ,^iR,,R2,6n)} 

{3 7f7l / 1 : J(C/"(l),X"(777i,l),yi") G ^(i?i,i?2,5„)} 

{3 7f7l / l,7n2 / 1 : J(?7"(7^72),^"(mi,7?72),yi") G (i?i , i?2, 5„)} 



(140) 
(141) 
(142) 



The vector J(n", x", 7/" ) is defined in (11371) . Clearly the average error probability for the ABC defined in (1281) can 
be bounded above as 

P^^ < P(^2,l U fi,i) + P{£2,2) + P(fl,2) + P(^:i,3). (143) 

Note that in contrast to the DM-MAC, we bound the probability of the union ^2,1 U <Si^i instead of bounding the 
probabilities of the constituent events separately. This is an important distinction. By doing so, we can use the 
vector rate redundancy theorem on an empirical mutual information vector of length-3. See ( 11451 ) below. We bound 
the first term in (11431 ). which can be written as 



P(£:2,i u £:i,i) = 1 - P(i(c/"(i), x"(i, i), y^", y^^) > r + 5„i), 

where the length-3 empirical mutual information vector is defined as 

"/(X" Ayi"|c/") 

/([/'^ A Y^) 

/(x"Ayi") 

Using the fact that (i?i,i?2) £ ^{'n;^',Pu,x), we can rewrite (11441) as 



i(c/",x",y]",y2") := 



(144) 



(145) 



P((f2,i u ^1,1)^=) = p i([/"(i), 1), yi", y^") > iipu,x, w) + 



77 



a„l + Snl 



(146) 



where from the definition of ^(V, e) in z £ is a vector satisfying P(Z > z) > 1 - e and Z ~ Af{0, V). The 
sequence a„ = ILkSJl with v defined in (l36l ). Now we again invoke the vector rate redundancy theorem (Theorem |5]) 
with the following identifications: random variable X ^ {U, X, yi, y2), smooth function g{pu,xW) I{pu,x, W), 
evaluation vector z ^ z and sequence bn an — Sn- Then if the coefficient of a„ is larger than that of 5„, say 
u = \l{\\X\max{\yi\,\y2\} + 1 as in (|36l ). — 5n is a positive sequence of order @{^^^), satisfying ( [64] ). 
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Furthermore, the third moment ^abc ■= E[||i(C/, X, Yi, 12) — I(P!7,x, W^)!!!] is uniformly bounded as shown in 
(11891 ) in Appendix |D] Hence, going through the same argument as for the MAC (see (I124I )- (I125I )). 



P((£:2,i u £:i,ir) > l-e + O 



log n 



(147) 



The rest of the error events can be bounded using the atypicality of empirical mutual information lemma (Lemma|9l). 
Since the calculations are similar, we focus solely on <Si 2- For this event, we have 



1,2, 



< P(J([/"(i),x"(mi,i),yn e =^(^i,«2,5n)) 

mi 7^1 

< ([2"^^] -l)P(J(C/"(l),X"(2,l),yi") G ^iRi,R2,5n)) 



< 2"^^P(/(X"(2, 1) A yi"|?7"(l)) >Ri+ Sn) 

< (^^ ^ i^\U\\X\\yi\2nR22~-niR2+S^) _ 



(148) 

(149) 
(150) 
(151) 



The reasoning for each of these steps is similar to that for the DM-MAC. See steps (11281 ) to (I132I ). The crucial 
realization to get from (1150b to (11511 ) via the use of the atypicality of empirical mutual information lemma is that 
for nil 7^ 1, the satellite codeword X"(mi,l) is conditionally independent of given the cloud center C/"(l). 
By the choice of 6n introduced at the decoding step, we have 

P(^:i,2) < -r^. (152) 

Similarly, P(<S2,2) < (n + l)-!/^ and P(<S2,3) < (n + 1)^^/2 ^p^is, combined with (fT43l ) and ([147] ). shows that the 
average error probability for the DM-ABC, defined in (|28] ). is no greater than e. Hence, there exists a deterministic 
code whose average error probability is no greater than e as desired. ■ 
2) Cardinality Bounds: 

Proof: The bound on \U\ can be argued in the same way as we did for the DM-MAC in Section rVIII-C2l 
We need — 1 elements to preserve px{x),x G {0, . . . , — 2} and 7 additional elements to preserve the 
two mutual information quantities I{U;Y2) and I{X;Yi\U), two variances along the diagonals of 'V{pu,x,W), 
i.e., Var(log[VFi(Yi|X)/py^l^(Yi|?7)]) and \/ar{log\pY^\u{Y2\U)/pY2{Y2)]) and three covariances in the off-diagonal 
positions in y^{pu,x,W). Note that I{X;Yi) and \/ar{log[Wi{Yi\X)/px{X)]) are automatically preserved given 
that we have preserved px{x) and they do not depend on U. Hence, |^| < jA"! + 6. ■ 

Appendix A 
Proof of Proposition |4] 

Proof: We now prove (l42l) . i.e., that F = [V]i.i/(cos^ ^) + r„, where r„ is exponentially decaying. Recall 
that we assumed that Rl = H{Xi\X2) and R*^ > H{X2). Define S:= R*^- H{X2) > 0. Let Z := (^1,^2, Z3) ~ 
A/'(0,V). From the definition of o5^(V,e), we see that F is the solution to the equation 

Ri{n,e) - H{Xi\X2) 1\ 

i?2(n, e) - H{X2\Xi) = ^ " « (153) 

_Ri{n, e) + i?2(n, e) - ^2)] / 

where i?i(n,e) and R2{n,e) are defined in (l39l ) and (l40l ) respectively. Also see (l59l ). Note that the condition in 
(11531) can be rewritten as 

P(^in^2n^3) = 1-e, (154) 
where after performing some basic information-theoretic manipulations, we see that the events can be expressed as 

^1 = {^1 < /F(cos6')Q-^(e)} (155) 

^2 = {^2 < V^(/(Xi;X2) + <5) + \/F(cos0)Q-i(e))} (156) 

^3 = {^3 < VnS + Vf {cos 9 + sin 0)q^\e)^ . (157) 





f 


'Zi' 








Z2 








Z3 
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We claim that P(^2) ^^'^ ^(-^3) converge to zero exponentially fast. Indeed, 



P(^2) = Q + 6) + (cos0)Q-i(e)j (158) 

<^exp(^-^^(/(Xi;X2) + <5)2) =:Ti,n, (159) 

where ti „ is an exponentially decaying sequence and the inequality is due to the Chemoff bound for the Q-function, 
i.e., Q(t) < i exp(— ^) for all t > 0. By the same argument that led to ( 11591 ). 

where T2,n is another exponentially decaying sequence. Knowing that P(.42) and P(v4.§) are small means that P(^i) 
must be close to 1 — e from ( 11541 ). Indeed, we have 

l-e<P(^i) (161) 

< P(^i n ^2 n ^3) + P(^2) + PC-^s) (162) 

< 1 - e + Ti,„ + T2,n, (163) 

where the second inequality is from the union bound. From (11551 ) and the definition of the Q-function, 



P(^i) = 1-Q( J^;^(cos^)Q-i(e)j . (164) 



On account of ( 11631 ). 



Q-He) < J|;^(cose)Q-i(e) <Q-i(e-(ri,„ + T2,„)). (165) 



In addition, by Taylor's approximation theorem, Q~^(e — (ti^^ + T2,n)) = Q^(e) + "^n for some exponentially 
decaying t'^. This completes the proof of (l42l ). The proofs of ( |43] ) and (l44l ) follow analogously so we omit them. 

For (|45]), note that (i?^,i?2) is a corner point. In particular, = H{Xi\X2) and R2 = H{X2). Clearly, F is 
the solution of the equation: 

P(i3ini32ni33) = l-e, (166) 

where the events can be written as 

^1 = {^1 < VF(cose) Q"^(e)} (167) 
B2 = 1^2 < V^/(Xi;X2) + \/F(sin0)Q-i(e))} (168) 
133 = [Zi< /F(cose + sin0)Q-i(e)} . (169) 

By the same argument that led to (11591 ). P(02) ~^ exponentially fast. Hence, 

1 - e < P{Bi n fia) < 1 - e + (170) 

where t„ decays exponentially fast. By a simple (diagonal) change of coordinates. 



P(i3ine3) = ^ (^Pi,3;-y|^ (cos 0)Q-^(e),-y 1^ (cos + sine) Q-i(e)j , (171) 

where ^ is the bivariate generalization of the Q-function, defined in (|4TI) . Eq. (l45l ) follows upon the substitution 
of (11711 ) into (11701 ). The result in (l46l) follows by the same argument with 1 in place of 2 and vice versa. ■ 
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Appendix B 
Proof of Corollary [7] 

Proof: We use Theorem |6] to prove Corollary |7] Let V = LL^ be the Cholesky decomposition of the matrix 
V, defined in (|63] ). The lower-triangular matrix L € W^^'^ is the left Cholesky factor of V. Define the change of 
coordinates := LU/, G for all /c = 1, . . . , n. Then, Cov(Ufc) = E[(LUfc)(LUfc)^] = LEtUfcU^lL"^ = V 
because E[UfcU^] = I by assumption. Substituting this into (l66l ) yields 



sup 



1 " 



G L<^ - P(LZ G L<r) 



< 



(172) 



Clearly, the family of convex, Borel subsets in M^, namely remains closed under matrix multiplication, i.e., 
Crf = LCd- Thus, (11721 ) can be rewritten as 



sup 



P(Z G ^) 



< 



n 



(173) 



where = l/to and Z ~ 7V(0, V). Now, recall that ^ = E[||Ui||2]. We upper bound this quantity as follows: 
Replacing Ui by L^^Ui yields 



= E 

= E 

= E 

^ -^max 



L-iUi||3 



(174) 
(175) 
(176) 
(177) 

A„.in(V)3/2-U— J' (^^^^ 

where (11771 ) holds because y^Ay < Amax(A)||y||2 for all vectors y. The proof is completed upon the substitution 



(UfL-^L-iUi)3/2 
(UfV-iUi)3/2" 



1 



luiiii 



of the upper bound in (11781 ) into (11731 ) and the identification of the third moment of Ui namely, ^ 



E U 



1||2J- 



Appendix C 
Proof of Lemma [8] 

Proof: Define the events J" := {G > v + (pi] and := {A > -(pi). Then, JTl ^ C {G + A > v}. As such 



p(G + A > v) > P{Fr\g) 

= P{F) - P{Fr^g^) 
> P{T) - P{g'). 



In addition, we have 



p(a^) = p(A < -01) < p( 

The combination of (11821) and (1183b yields (1761 ) as desired. 



(179) 
(180) 
(181) 
(182) 

(183) 



Appendix D 
finiteness of third moments 

In this appendix, we prove that the third moments are finite. For notation, see Sections III-AI IIII-AI and IIV-AI 
Lemma 10. For the SW, MAC and ABC problems, let the third moments be defined as 



Cmac 
Cabc 



E[||h(Xi,X2)-H(px„xJi] 
E[\\i{Q,X^,X2,Y)-I{pQ,px,\Q,Px.\Q,W)\\l 
E [\\i{U,X,Yi,Y2)-Iipu,x,W)\\l] . 



(184) 
(185) 
(186) 
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Then, all three quantities are uniformly bounded. More precisely, 

is^ <^V^-{\Xi\ + \X2\ + \Xl\\X2\ 

Cmac < 15^-13^1 

Cabc < 5^3 -(213^11 + 13^21). 



(187) 
(188) 
(189) 



Proof: We will only prove the second assertion in ( I188I ). The other two assertions for the SW and ABC follow 
mutatis mutandis and essentially leverage on the fact that the ranges of the random variables are finite. 

For brevity, let Ai, A2 and be the components of the random vector Xi, X2, 1") defined in (l22l ). So for 



example, Ai := log[W{Y\Xi, X2)/py\X2,qO^\^2,Q)] - Ii^i'^Y\^2,Q)- Because a i-)- a^/^ is convex. 



< 



[{Aj + Al + Alf/^ 



3A 



(190) 
(191) 

(192) 



Subsequently, we simpUfy notation by dropping the subscripts on the distributions, e.g., p{y\x2.iq) := Py|X2,Q(?/k2j <?) 
[see ([20l)1. We focus on the first term in the sum in ( 11921 ). which can be bounded as 



< E 



< E 



log 
log 
log- 



W{Y\Xi,X2) 
v{y\X2,Q) 

W{Y\Xi,X2) 
p{y\X2,Q) 



I{Xi-Y\X2,Q) 



p{Y\X2,Q), 
'^p{q)p{x2\q)'^p{y\x2.,q) ( log ■ 



(193) 
(194) 
(195) 
(196) 



. p{y\x2,q) 

q,X2 y ^ 7 1/ 

where ( 11941) follows from the fact that t ^ is monotonically increasing and mutual information is non-negative. 
Inequality ( 11951 ) follows because W{y\xi,X2) < 1 for all {xi,X2,y) ^ Xix X2xy. Now by simple calculus, the 
function u i-)- u{—logu)^ is bounded above by (1^)'^ exp2(— j^) < 5 for all u G [0, 1]. Hence, (11961 ) reduces to 

E [Al] < 5\y\. (197) 

All the other terms can be bounded similarly. This completes the proof. ■ 

Appendix E 
Proof of Lemma [9] 

Proof: For convenience, we introduce dummy random variables ([/, X, Y) distributed according to Pu",X",Y", 
the type of {U"-,X^,Y"-). This means that Po x y ~ Pu",X",Y'^- Then, note that 

Px,Y\uiX,Y\U) - 
'Px\uiX\U)PY\uiY\U)_ 

since X — C/ — y form a Markov chain in that order so px,y|(7(3^) y|'")/(PA'|c/(3;|'w)py|(7(yk)) = 1 for {x,y,u) e 
X X y X U. Let pj^ Y\u •= Pu X y/Pu conditional type and let Px^ and Py\(} be the X- and 3^-marginals 

of P x,Y\u respectively. Now, by expressing the mutual information I{X;Y\U) as an expectation, we readily see 
that (1198b simplifies as 



I{X-Y\tj) = I{X-Y\U) 



Pu,X,Y 



log- 



(198) 



I{X;Y\U) = D{pj^^y^fj\\pxx\u\Pu) - D{Px^o\\px\u\Pu) - DiPY\u\\PY\u\Pu)- 



(199) 
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Because conditional relative entropies in ( 11991 ) are non-negative, 

IiX;Y\U) < D{p^^y^^\\px,Y\u\Pu)- (200) 
To simplify notation, let W := Px,Y\u- t > 0. Now consider 

PiI{X;Y\U) >t)< P{Dip^^y^^\\W\p^) > t) (201) 

= E E Pui^'^^ E ^"(TyK)!^") (202) 

DiV\\W\Q)>t 

^ E E ^'&(^") E 2-"^(^ll^l'3) (203) 

Qe.5*'„{W) ""STq Ver„{A'xy;Q): 
DiV\\W\Q)>t 

< E ^'&("")(™ + l)'""'"'^'2""* (204) 

= (^ + l)NWm2-* (205) 

where in (1201b we used the bound in (1200b . For (1202b . we noted that the type of u" in the innermost sum is Pu^ = Q. 
In ( 12031 ). we used 1141 Lemma 1.2.6] to upper bound the • -probability of a F-shell. In ( 12041 ). we applied the 
Type Counting Lemma for conditional types |14, Eq. (2.5.1)] which asserts that \'f'n{X x y]Q)\ < (n + l)l"ll'^ll^l. 
This completes the proof. ■ 
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